Skip to content

Proposal: Shorter default Series/DataFrame repr when truncated #27000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jun 22, 2019 · 8 comments · Fixed by #27095
Closed

Proposal: Shorter default Series/DataFrame repr when truncated #27000

jorisvandenbossche opened this issue Jun 22, 2019 · 8 comments · Fixed by #27095
Labels
Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@jorisvandenbossche
Copy link
Member

There haven been previous attempts to reduce the length of the Series/DataFrame repr (pandas.options.display.max_rows), eg #20514. Related pandas-dev email: https://mail.python.org/pipermail/pandas-dev/2018-March/000732.html

In that discussion, I once made the following proposal to introduce two thresholds:

  • We have 2 thresholds instead of 1 (the current 'max_rows'): a number of
    rows to show in a truncated repr, and a max number of rows to show
    without truncating
  • For 'big' dataframes, we show a truncated repr. And then I would go even
    lower than 20 and only show first/last 5 (so like a max_rows of 10)
  • For 'small' dataframes, we show the full dataframe without truncating, up
    to the threshold.

We would still need to define those two thresholds. But for example, using the current max_rows of 60: we could show a full repr up to 60 rows, and once the number of rows > 60, we only show 10 (first/last 5).

You can then still set both thresholds at the same number (like 20, as in the linked PR above) to not get this variable behaviour.

This is actually similar to what numpy arrays do (but with a bigger threshold: eg np.random.randn(1000) shows all 1000 elements, np.random.randn(1001) shows the first/lst 3).
And it is also very similar to what R tibbles do: they have a "print_min" and "print_max" options with exactly this behaviour, only their "print_max" is lower (it's 10 and 20, respectively):

options(tibble.print_max = n, tibble.print_min = m): if there are more than
n rows, print only the first m rows. Use options(tibble.print_max = Inf)
to always show all rows.

@jorisvandenbossche jorisvandenbossche added the Output-Formatting __repr__ of pandas objects, to_string label Jun 22, 2019
@simonjayhawkins
Copy link
Member

so this would add two display options pandas.options.display.min_rows and pandas.options.display.min_columns

and two arguments, to to_string and to_html(notebook=True); min_rows and min_cols?

personally, since this proposal is display related and not so relevant to IO, I would prefer not to see the additional arguments to to_string and to_html

@jorisvandenbossche
Copy link
Member Author

I would personally only start with min_rows (we could always add the columns one later if there is demand for it).

And also personally for me, I am fine with only adding it as a general display option for now, and not necessarily to to_string / to_html.

@simonjayhawkins
Copy link
Member

+1 in that case.

@TomAugspurger
Copy link
Contributor

+1 from me as well.

@jorisvandenbossche
Copy link
Member Author

cc @pandas-dev/pandas-core we might still include this in 0.25.0. Any concerns about the above proposal?

@shoyer
Copy link
Member

shoyer commented Jul 3, 2019

+1 sounds great to me

@toobaz
Copy link
Member

toobaz commented Jul 3, 2019

+1 for me too

@topper-123
Copy link
Contributor

I'm +1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants