Skip to content

REGR: Notebook (html) repr of DataFrame no longer follows min_rows/max_rows settings #37363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 20 additions & 9 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -639,20 +639,31 @@ def _calc_max_cols_fitted(self) -> Optional[int]:

def _calc_max_rows_fitted(self) -> Optional[int]:
"""Number of rows with data fitting the screen."""
if not self._is_in_terminal():
return self.max_rows
max_rows: Optional[int]

_, height = get_terminal_size()
if self.max_rows == 0:
# rows available to fill with actual data
return height - self._get_number_of_auxillary_rows()
if self._is_in_terminal():
_, height = get_terminal_size()
if self.max_rows == 0:
# rows available to fill with actual data
return height - self._get_number_of_auxillary_rows()

max_rows: Optional[int]
if self._is_screen_short(height):
max_rows = height
if self._is_screen_short(height):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this screen_short doing? I think it is rather about the dataframe being longer than the screen, and that we want to show all rows possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    def _is_screen_short(self, max_height) -> bool:
        return bool(self.max_rows == 0 and len(self.frame) > max_height)

If in terminal (max_rows==0) and if screen height (terminal height) is not enough to accommodate the dataframe entirely, then the screen is short.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the logic can be improved, then please suggest. But my refactoring mainly consisted of removing comments and making methods with the same/similar names. Apparently got lost in some logic, but that was not covered by the tests properly.
I think that we can make a system test with the string representation as well, once we see that the present behavior is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code snippet before refactoring (regarding _is_screen_short).

            # Format only rows and columns that could potentially fit the
            # screen
            if max_cols == 0 and len(self.frame.columns) > w:
                max_cols = w
            if max_rows == 0 and len(self.frame) > h:
                max_rows = h

max_rows = height
else:
max_rows = self.max_rows
else:
max_rows = self.max_rows

return self._adjust_max_rows(max_rows)

def _adjust_max_rows(self, max_rows: Optional[int]) -> Optional[int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is worth creating a separate function for those few lines, I think it fits fine in the function above, as it was before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that some explanation should be added why there are still some changes to max_rows and comparison with frame length. Could be a comment, but a separate function with the docstring is better, isn't it.

"""Adjust max_rows using display logic.

See description here:
https://pandas.pydata.org/docs/dev/user_guide/options.html#frequently-used-options

GH #37359
"""
if max_rows:
if (len(self.frame) > max_rows) and self.min_rows:
# if truncated, set max_rows showed to min_rows
Expand Down
29 changes: 29 additions & 0 deletions pandas/tests/io/formats/test_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -2029,6 +2029,35 @@ def test_period(self):
)
assert str(df) == exp

@pytest.mark.parametrize(
"length, max_rows, min_rows, expected",
[
(10, 10, 10, 10),
(10, 10, None, 10),
(10, 8, None, 8),
(20, 30, 10, 30), # max_rows > len(frame), hence max_rows
(50, 30, 10, 10), # max_rows < len(frame), hence min_rows
(100, 60, 10, 10), # same
(60, 60, 10, 60), # edge case
(61, 60, 10, 10), # edge case
],
)
def test_max_rows_fitted(self, length, min_rows, max_rows, expected):
"""Check that display logic is correct.

GH #37359

See description here:
https://pandas.pydata.org/docs/dev/user_guide/options.html#frequently-used-options
"""
formatter = fmt.DataFrameFormatter(
DataFrame(np.random.rand(length, 3)),
max_rows=max_rows,
min_rows=min_rows,
)
result = formatter.max_rows_fitted
assert result == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually already tested (see #37359 (comment), so not sure it is necessarily needed to add additional tests). But so the problem is that it was not catching the regression (and I suppose those tests above will also not have catched it?)

I will try to think about another way to test it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests do fail on master

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas\tests\io\formats\test_format.py:2059: AssertionError
============================================================== short test summary info ===============================================================
FAILED pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_max_rows_fitted[50-30-10-10] - assert 30 == 10
FAILED pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_max_rows_fitted[100-60-10-10] - assert 60 == 10
FAILED pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_max_rows_fitted[61-60-10-10] - assert 60 == 10
=========================================================== 3 failed, 206 passed in 4.35s ============================================================



def gen_series_formatting():
s1 = Series(["a"] * 100)
Expand Down