Skip to content

REGR: Notebook (html) repr of DataFrame no longer follows min_rows/max_rows settings #37363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 13, 2020

Conversation

ivanovmg
Copy link
Member

Fix display logic in number of rows,
explained here https://pandas.pydata.org/docs/dev/user_guide/options.html#frequently-used-options

@ivanovmg
Copy link
Member Author

ivanovmg commented Oct 23, 2020

@jorisvandenbossche can you please confirm it works? I am still unable to install dev environment with jupyter (even conda).

@simonjayhawkins simonjayhawkins added IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version labels Oct 23, 2020
@simonjayhawkins simonjayhawkins changed the title FIX: REGR: Notebook (html) repr of DataFrame no longer follows min_rows/max_rows settings Oct 23, 2020
@simonjayhawkins simonjayhawkins added this to the 1.2 milestone Oct 23, 2020
@ivanovmg
Copy link
Member Author

I managed to install dev env with jupyter. Looks like everything works. But please check in your environment as well.

image

@jorisvandenbossche
Copy link
Member

I can confirm this works, thanks!

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick follow-up!

else:
max_rows = self.max_rows

return self._adjust_max_rows(max_rows)

def _adjust_max_rows(self, max_rows: Optional[int]) -> Optional[int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is worth creating a separate function for those few lines, I think it fits fine in the function above, as it was before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that some explanation should be added why there are still some changes to max_rows and comparison with frame length. Could be a comment, but a separate function with the docstring is better, isn't it.

return height - self._get_number_of_auxillary_rows()

max_rows: Optional[int]
if self._is_screen_short(height):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this screen_short doing? I think it is rather about the dataframe being longer than the screen, and that we want to show all rows possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    def _is_screen_short(self, max_height) -> bool:
        return bool(self.max_rows == 0 and len(self.frame) > max_height)

If in terminal (max_rows==0) and if screen height (terminal height) is not enough to accommodate the dataframe entirely, then the screen is short.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the logic can be improved, then please suggest. But my refactoring mainly consisted of removing comments and making methods with the same/similar names. Apparently got lost in some logic, but that was not covered by the tests properly.
I think that we can make a system test with the string representation as well, once we see that the present behavior is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code snippet before refactoring (regarding _is_screen_short).

            # Format only rows and columns that could potentially fit the
            # screen
            if max_cols == 0 and len(self.frame.columns) > w:
                max_cols = w
            if max_rows == 0 and len(self.frame) > h:
                max_rows = h

min_rows=min_rows,
)
result = formatter.max_rows_fitted
assert result == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually already tested (see #37359 (comment), so not sure it is necessarily needed to add additional tests). But so the problem is that it was not catching the regression (and I suppose those tests above will also not have catched it?)

I will try to think about another way to test it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests do fail on master

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas\tests\io\formats\test_format.py:2059: AssertionError
============================================================== short test summary info ===============================================================
FAILED pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_max_rows_fitted[50-30-10-10] - assert 30 == 10
FAILED pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_max_rows_fitted[100-60-10-10] - assert 60 == 10
FAILED pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_max_rows_fitted[61-60-10-10] - assert 60 == 10
=========================================================== 3 failed, 206 passed in 4.35s ============================================================

# rows available to fill with actual data
return height - self._get_number_of_auxillary_rows()

max_rows: Optional[int]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type this at the top to make it less cluttering

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that splitting even further would be better.

if self._is_in_terminal():
    return self._get_max_rows_fitted_in_terminal()
else:
    return self._adjust_max_rows(self.max_rows)

Probably other PR?

@jreback
Copy link
Contributor

jreback commented Oct 31, 2020

also pls rebase

@ivanovmg ivanovmg requested a review from jreback November 1, 2020 04:05
@jreback
Copy link
Contributor

jreback commented Nov 4, 2020

@ivanovmg can you merge master.

@jorisvandenbossche if any comments.

@simonjayhawkins
Copy link
Member

@ivanovmg can you merge master one more time and this can be merged cc @jorisvandenbossche

@ivanovmg
Copy link
Member Author

@ivanovmg can you merge master one more time and this can be merged cc @jorisvandenbossche

Done, it's green.

@jreback jreback merged commit b966657 into pandas-dev:master Nov 13, 2020
@jreback
Copy link
Contributor

jreback commented Nov 13, 2020

thanks @ivanovmg

@ivanovmg ivanovmg deleted the bug_37359 branch November 13, 2020 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: Notebook (html) repr of DataFrame no longer follows min_rows/max_rows settings
4 participants