REF: pandas/io/formats/format.py #36434

ivanovmg · 2020-09-17T19:34:09Z

Partially addresses #36407

Before splitting DataFrameFormatter into more dedicated classes, I decided to refactor the class itself, to make the outstanding refactoring more manageable.

As suggested by @jreback, I made this refactor small, trying to split big functions into smaller ones and find some better naming.
Yet, rather small changes are done so far, just to keep diffs readable.
Once approved, I will move on to further refactor.

To ensure that mypy does not raise override error

This is done for consistency with max_rows_adj

pep8speaks · 2020-09-17T19:34:14Z

Hello @ivanovmg! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-19 18:09:57 UTC

simonjayhawkins · 2020-09-19T17:23:54Z

pandas/io/formats/format.py

+            return strcols
+
+        if is_list_like(self.header):
+            assert isinstance(self.header, list)


is_list_like is not the same as isinstance(..., list). can you revert

Good catch. Thank you

simonjayhawkins · 2020-09-19T17:54:24Z

pandas/io/formats/format.py


-        self.bold_rows = bold_rows
-        self.escape = escape
+        if hasattr(self, "_max_cols_fitted"):


hasattr checks don't play play well with mypy

max_cols_fitted is called from is_truncated_horizontally which is quite prevalent.

optimisation here may be premature here, but in I think generally avoiding hasattr checks is preferable is called often.

class MyClass: pass cls = MyClass() %timeit hasattr(cls, "_max_cols_fitted") # 120 ns ± 8.08 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) class MyClass: _max_cols_fitted = None cls = MyClass() %timeit cls._max_cols_fitted is not None # 85.5 ns ± 1.94 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

what is the reason for changing max_rows_adj to a property?

The reason for making the property out of max_rows_adj/max_rows_fitted was to place the computation inside the function. Before that it was scattered in the other methods, while it was difficult to understand how the calculation was actually carried out. The present implementation makes it more clear.

Before hasattr check I used try/except AttributeError check. But if I understood correctly, @jreback suggested to revert to hasattr check, which was there originally.

having the separate function is good. maybe _calc_max_cols_fitted and just call it once and assign to attribute like before

Before hasattr check I used try/except AttributeError check. But if I understood correctly, @jreback suggested to revert to hasattr check, which was there originally.

it was only called once before.

If performance is a concern, we can consider cached_property decorator or it's equivalent created for Python 3.7. I can do this if required.

having the separate function is good. maybe _calc_max_cols_fitted and just call it once and assign to attribute like before

Before hasattr check I used try/except AttributeError check. But if I understood correctly, @jreback suggested to revert to hasattr check, which was there originally.

it was only called once before.

This is fine with me. I will change as you suggest.

If performance is a concern

it's not. just suggesting avoiding certain patterns

from python/mypy#1424 (comment)

In general i despise hasattr() checks; it's usually much better to add a
class level initialization to None and check for that.

we can consider cached_property decorator or it's equivalent created for Python 3.7. I can do this if required.

This is not in the standard library til 3.8?

It is in the standard library starting from 3.8. Before that there wasn't one. So, I was thinking could b reasonable to implement the cached property for the needs, if the version < 3.8.

This reverts commit dcb2b70.

This allows one avoid multiple calls of the properties, which were there originally.

jreback · 2020-09-19T19:41:52Z

we already have a cached property decorator

cache_readonly

though only use it if we are actually doing something non trivial

simonjayhawkins · 2020-09-19T19:44:42Z

yeah cache_readonly is in cython without a stub file, so using here would undo all the typing benefits since properties would resolve to 'Any'

jreback

lgtm. ping on green. nice improvement. happy to have additional to separate classes as you mentioned.

jreback · 2020-09-19T20:08:51Z

pandas/io/formats/format.py

-        """
-        Render a DataFrame to a list of columns (as lists of strings).
+    def _truncate_horizontally(self) -> None:
+        """Remove columns, which are not to be displayed and adjust formatters.


wouldn't be against making this a non-mutating function and instead just return the new values (and set where this is called), but you did explain so ok.

jreback · 2020-09-19T22:05:44Z

thanks @ivanovmg keep em coming!

jorisvandenbossche · 2020-09-25T13:53:26Z

pandas/io/formats/format.py

+                x
+                for x in range(self.frame.shape[0])
+                if x < row_num or x >= len(self.frame) - row_num
+            ]


I suppose this is the reason for the slowdown (#36636). In the original code, we do 2 slices + a concat. What was the motivation for this (slow) for loop? To select the subset in a single go?

Right, it was supposed to be more readable to extract necessary rows. But, apparently, I never considered cases with large number or rows. Will fix it.

jorisvandenbossche · 2020-09-25T19:07:57Z

pandas/io/formats/format.py

+            - tr_row_num
+        """
+        assert self.max_rows_fitted is not None
+        row_num = self.max_rows_fitted // 2


BTW, I find the "max_rows_fitted" naming instead of "max_rows_adjusted" as it was before a bit confusing, as the "fitting" seems to come from "fit on the screen", but the eventual max_rows are not necessarily what fit on the screen, that depends on your settings.

OK. Will change in the upcoming PRs.

ivanovmg added 28 commits September 17, 2020 22:14

REF: extract formatters property

b0d6296

REF: extract justify property

8e7a65f

REF: extract columns property

9e741dd

REF: extract col_space property

a222616

REF: extract sparsify property

4bbd370

REF: extract max_rows_displayed property

6c51ca4

CLN: eliminate 'w' and 'h' attributes

ad61521

TYP: annotate col_space

21be89b

TYP: annotate formatters

df77b0d

TYP: annotate columns

2126f1b

TYP: annotate max_rows_displayed

1947914

TYP: add formatters property in superclass

3496e97

To ensure that mypy does not raise override error

TYP: add columns property in superclass

2de23a0

To ensure that mypy does not raise override error

REF: extract method _is_in_terminal

84863fb

REF: extract max_cols_adj property

f546378

REF: extract max_rows_adj property

bb01b8f

REF: extract _is_screen_narrow/short methods

bffd773

REF: simplify logic in max_rows_adj

1f955b9

REF: check for terminal in max_cols_adj

6386a6c

This is done for consistency with max_rows_adj

REF: extract truncate_h/v and is_truncated props

17e545c

REF: truncate_h -> is_truncated_horizontally

99035c3

REF: truncate_v -> is_truncated_vertically

e16fc9d

TYP: make is_truncated compatible with superclass

19948b8

CLN: remove unnecessary variables

c57c4ce

REF: extract _get_number_of_auxillary_rows

cb680ce

REF: use var header_row for clarity

b657952

TYP: ignore assignment error (setters)

84c42b7

TYP: specify max_rows type

6f1a81f

LINT: long comment line

8f583c7

ivanovmg added 10 commits September 19, 2020 14:30

REF: replace sparsify setter with initialization

186c98f

REF: replace formatters setter with initialization

52d6a16

REF: replace col_space setter with initialization

e748370

REF: replace columns setter with initialization

2721f98

REF: replace justify setter with initialization

92c10ab

CLN: reorder terminal/screen related methods

8849a98

TYP: replace casting with actual int conversion

1b9ff3f

TYP: assert instead of casting

dcb2b70

CLN: max_rows/cols_adj -> max_rows/cols_fitted

e2b75e5

TYP: move _is_truncated type declaration on top

72db492

simonjayhawkins reviewed Sep 19, 2020

View reviewed changes

Revert "TYP: assert instead of casting"

027a828

This reverts commit dcb2b70.

ivanovmg requested a review from jreback September 19, 2020 18:14

REF: assign to max_cols/rows_fitted at init

9203a54

This allows one avoid multiple calls of the properties, which were there originally.

ivanovmg requested review from simonjayhawkins and jbrockmendel September 19, 2020 19:06

jreback approved these changes Sep 19, 2020

View reviewed changes

jreback added this to the 1.2 milestone Sep 19, 2020

jreback merged commit 2705dd6 into pandas-dev:master Sep 19, 2020

ivanovmg deleted the refactor/fmt branch September 20, 2020 02:46

ivanovmg mentioned this pull request Sep 20, 2020

REF: dataframe formatters/outputs #36510

Merged

5 tasks

jorisvandenbossche mentioned this pull request Sep 25, 2020

PERF: large perf regression in DataFrame repr #36636

Closed

jorisvandenbossche reviewed Sep 25, 2020

View reviewed changes

jorisvandenbossche mentioned this pull request Oct 23, 2020

REGR: Notebook (html) repr of DataFrame no longer follows min_rows/max_rows settings #37359

Closed

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

REF: pandas/io/formats/format.py (pandas-dev#36434)

98e62c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: pandas/io/formats/format.py #36434

REF: pandas/io/formats/format.py #36434

ivanovmg commented Sep 17, 2020

pep8speaks commented Sep 17, 2020 •

edited

Loading

simonjayhawkins Sep 19, 2020

ivanovmg Sep 19, 2020

simonjayhawkins Sep 19, 2020 •

edited

Loading

ivanovmg Sep 19, 2020

simonjayhawkins Sep 19, 2020

ivanovmg Sep 19, 2020

ivanovmg Sep 19, 2020

simonjayhawkins Sep 19, 2020

simonjayhawkins Sep 19, 2020

ivanovmg Sep 19, 2020

jreback commented Sep 19, 2020

simonjayhawkins commented Sep 19, 2020

jreback left a comment

jreback Sep 19, 2020

jreback commented Sep 19, 2020

jorisvandenbossche Sep 25, 2020

ivanovmg Sep 25, 2020

jorisvandenbossche Sep 25, 2020

ivanovmg Sep 25, 2020

REF: pandas/io/formats/format.py #36434

REF: pandas/io/formats/format.py #36434

Conversation

ivanovmg commented Sep 17, 2020

pep8speaks commented Sep 17, 2020 • edited Loading

Comment last updated at 2020-09-19 18:09:57 UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins Sep 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 19, 2020

simonjayhawkins commented Sep 19, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Sep 17, 2020 •

edited

Loading

simonjayhawkins Sep 19, 2020 •

edited

Loading