Skip to content

REGR: Notebook (html) repr of DataFrame no longer follows min_rows/max_rows settings #37359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Oct 23, 2020 · 10 comments · Fixed by #37363
Closed
Labels
Blocker Blocking issue or pull request for an upcoming release Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

On master, the html repr in notebooks of a DataFrame is no longer following the min_rows/max_rows logic (see https://pandas.pydata.org/docs/dev/user_guide/options.html#frequently-used-options).

If you have a DataFrame with many rows, it incorrectly shows 60 (max_rows) rows in the truncated repr intead of 10 (min_rows).

cc @ivanovmg (might be related to the refactoring you have been doing, but to be clear, I didn't check if this is actually the case!)

@jorisvandenbossche jorisvandenbossche added Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version Blocker Blocking issue or pull request for an upcoming release labels Oct 23, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.2 milestone Oct 23, 2020
@jorisvandenbossche
Copy link
Member Author

We do have tests for this:

with option_context("display.max_rows", 10, "display.min_rows", 4):
# truncated after first two rows
assert ".." in repr(df)
assert "2 " not in repr(df)
assert "..." in df._repr_html_()
assert "<td>2</td>" not in df._repr_html_()

but apparently, they pass when not in a notebook environment, and _repr_html_ gives different results in a console vs in a notebook, now. This is not the case on pandas 1.1.3.

@jorisvandenbossche
Copy link
Member Author

Can confirm it is a regression due to #36434

@ivanovmg
Copy link
Member

Does it mean that tests must be modified first? I can work on fixing the issue, but I would be grateful if somebody creates/fix tests, so that they fail.

@jorisvandenbossche
Copy link
Member Author

I am not directly sure how to fix the tests. A first step would probably be to understand why it is returning something different for _repr_html_ in console vs notebook (which was for some reason caused by #36434), which might give a hint on how we can properly test it. But that's basically the same as what is needed to fix the issue ..

@jorisvandenbossche
Copy link
Member Author

So it is _calc_max_rows_fitted that gives something different in console vs notebook:

console:

In [2]: pd.DataFrame(np.random.randn(100000, 4))._repr_html_()
> /home/joris/scipy/pandas/pandas/core/frame.py(793)_repr_html_()
-> return fmt.DataFrameRenderer(formatter).to_html(notebook=True)
(Pdb) formatter.max_rows_fitted
10
(Pdb) 

notebook:
image

I think this is related to this if not self._is_in_terminal(): you added in #36434:

def _calc_max_rows_fitted(self) -> Optional[int]:
"""Number of rows with data fitting the screen."""
if not self._is_in_terminal():
return self.max_rows

Do you remember why you added that?

@ivanovmg
Copy link
Member

Originally there was a complex logic in _truncate() method.
Including this check:

        if max_cols == 0 or max_rows == 0:  # assume we are in the terminal
            (w, h) = get_terminal_size()

So, instead of a comment I extracted method.

I would like to check everything, but I struggle to install dev environment with jupyter.
On windows 10 I use docker (no ability to start notebook).
And I cannot install dev environment on windows natively because of this:

File "setup.py", line 94, in run self._run_cmake() File "setup.py", line 273, in _run_cmake raise RuntimeError('Not supported on 32-bit Windows') RuntimeError: Not supported on 32-bit Windows ---------------------------------------- ERROR: Failed building wheel for pyarrow Building wheel for aiohttp (PEP 517): started Building wheel for aiohttp (PEP 517): finished with status 'done' Created wheel for aiohttp: filename=aiohttp-3.6.3-cp38-cp38-win32.whl size=601804 sha256=346f18929faa79c6bb86c777633cb97b27c86ed5d7876d418da25f82567d4996 Stored in directory: c:\users\mivanov4\appdata\local\pip\cache\wheels\2d\6d\bb\486f8c893f1dcc917860a5b3e2f2ca286c398f7d548ffc649c Successfully built black blosc aiohttp Failed to build pyarrow ERROR: Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly

Do I need python 64 for this?

@jorisvandenbossche
Copy link
Member Author

My general recommnedation then is: use conda .. (it's trying to build a pyarrow wheel, not sure why that is, because there should be wheels for pyarrow for windows)

Originally there was a complex logic in _truncate() method. if max_cols == 0 or max_rows == 0: # assume we are in the terminal .. So, instead of a comment I extracted method.

The main difference might be that you now "return" directly self.max_rows in this case, while before whether you were in a terminal or not, there was still other code (not in the if max_cols == 0 or max_rows == 0: block) that was run

@ivanovmg
Copy link
Member

I will install it via conda then and check.

Probably this will help:

    def _calc_max_rows_fitted(self) -> Optional[int]:
        """Number of rows with data fitting the screen."""
        if not self._is_in_terminal():
            max_rows = self.max_rows
        else:
            _, height = get_terminal_size()
            if self.max_rows == 0:
                # rows available to fill with actual data
                return height - self._get_number_of_auxillary_rows()

            max_rows: Optional[int]
            if self._is_screen_short(height):
                max_rows = height
            else:
                max_rows = self.max_rows

        if max_rows:
            if (len(self.frame) > max_rows) and self.min_rows:
                # if truncated, set max_rows showed to min_rows
                max_rows = min(self.min_rows, max_rows)
        return max_rows

@jorisvandenbossche
Copy link
Member Author

Yep, I think that more or less sounds correct.

For testing, we might want to check if we can "monkeypatch" pandas to think it is in a notebook, when running the repr_html tests

@jorisvandenbossche
Copy link
Member Author

For testing, we might want to check if we can "monkeypatch" pandas to think it is in a notebook, when running the repr_html tests

Another option would be to set the max_columns option to something non-zero, which is what happens in a notebook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants