-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: pandas/io/formats/format.py #36434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
To ensure that mypy does not raise override error
To ensure that mypy does not raise override error
This is done for consistency with max_rows_adj
pandas/io/formats/format.py
Outdated
return strcols | ||
|
||
if is_list_like(self.header): | ||
assert isinstance(self.header, list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_list_like is not the same as isinstance(..., list). can you revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thank you
pandas/io/formats/format.py
Outdated
|
||
self.bold_rows = bold_rows | ||
self.escape = escape | ||
if hasattr(self, "_max_cols_fitted"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasattr checks don't play play well with mypy
max_cols_fitted is called from is_truncated_horizontally which is quite prevalent.
optimisation here may be premature here, but in I think generally avoiding hasattr checks is preferable is called often.
class MyClass:
pass
cls = MyClass()
%timeit hasattr(cls, "_max_cols_fitted")
# 120 ns ± 8.08 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
class MyClass:
_max_cols_fitted = None
cls = MyClass()
%timeit cls._max_cols_fitted is not None
# 85.5 ns ± 1.94 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
what is the reason for changing max_rows_adj
to a property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for making the property out of max_rows_adj/max_rows_fitted
was to place the computation inside the function. Before that it was scattered in the other methods, while it was difficult to understand how the calculation was actually carried out. The present implementation makes it more clear.
Before hasattr check I used try/except AttributeError check. But if I understood correctly, @jreback suggested to revert to hasattr check, which was there originally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having the separate function is good. maybe _calc_max_cols_fitted and just call it once and assign to attribute like before
Before hasattr check I used try/except AttributeError check. But if I understood correctly, @jreback suggested to revert to hasattr check, which was there originally.
it was only called once before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If performance is a concern, we can consider cached_property
decorator or it's equivalent created for Python 3.7. I can do this if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having the separate function is good. maybe _calc_max_cols_fitted and just call it once and assign to attribute like before
Before hasattr check I used try/except AttributeError check. But if I understood correctly, @jreback suggested to revert to hasattr check, which was there originally.
it was only called once before.
This is fine with me. I will change as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If performance is a concern
it's not. just suggesting avoiding certain patterns
from python/mypy#1424 (comment)
In general i despise hasattr() checks; it's usually much better to add a
class level initialization to None and check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can consider
cached_property
decorator or it's equivalent created for Python 3.7. I can do this if required.
This is not in the standard library til 3.8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in the standard library starting from 3.8. Before that there wasn't one. So, I was thinking could b reasonable to implement the cached property for the needs, if the version < 3.8.
This reverts commit dcb2b70.
This allows one avoid multiple calls of the properties, which were there originally.
we already have a cached property decorator cache_readonly though only use it if we are actually doing something non trivial |
yeah cache_readonly is in cython without a stub file, so using here would undo all the typing benefits since properties would resolve to 'Any' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ping on green. nice improvement. happy to have additional to separate classes as you mentioned.
""" | ||
Render a DataFrame to a list of columns (as lists of strings). | ||
def _truncate_horizontally(self) -> None: | ||
"""Remove columns, which are not to be displayed and adjust formatters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't be against making this a non-mutating function and instead just return the new values (and set where this is called), but you did explain so ok.
thanks @ivanovmg keep em coming! |
x | ||
for x in range(self.frame.shape[0]) | ||
if x < row_num or x >= len(self.frame) - row_num | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is the reason for the slowdown (#36636). In the original code, we do 2 slices + a concat. What was the motivation for this (slow) for loop? To select the subset in a single go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it was supposed to be more readable to extract necessary rows. But, apparently, I never considered cases with large number or rows. Will fix it.
- tr_row_num | ||
""" | ||
assert self.max_rows_fitted is not None | ||
row_num = self.max_rows_fitted // 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I find the "max_rows_fitted" naming instead of "max_rows_adjusted" as it was before a bit confusing, as the "fitting" seems to come from "fit on the screen", but the eventual max_rows are not necessarily what fit on the screen, that depends on your settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will change in the upcoming PRs.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Partially addresses #36407
Before splitting
DataFrameFormatter
into more dedicated classes, I decided to refactor the class itself, to make the outstanding refactoring more manageable.As suggested by @jreback, I made this refactor small, trying to split big functions into smaller ones and find some better naming.
Yet, rather small changes are done so far, just to keep diffs readable.
Once approved, I will move on to further refactor.