Skip to content

ENH: Expanded display of dataframe, akin to postgres \x #38827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
samzhang111 opened this issue Dec 30, 2020 · 4 comments
Closed

ENH: Expanded display of dataframe, akin to postgres \x #38827

samzhang111 opened this issue Dec 30, 2020 · 4 comments
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string

Comments

@samzhang111
Copy link

Is your feature request related to a problem?

When viewing large dataframes in a terminal, it is often preferable to display the data in "expanded format" or "long format" (as opposed to "wide"), essentially the result of melting every column. This makes the entire dataframe less sensitive to the width of the terminal.

Describe the solution you'd like

Postgres handles this nicely with its expanded mode (see -x or --expanded). To take a random postgres example, the following dataframe

 id | time  |       humanize_time             | value 
----+-------+---------------------------------+-------
  1 | 09:30 |  Early Morning - (9.30 am)      |   570
  2 | 11:30 |  Late Morning - (11.30 am)      |   690
  3 | 13:30 |  Early Afternoon - (1.30pm)     |   810
  4 | 15:30 |  Late Afternoon - (3.30 pm)     |   930
(4 rows)

becomes printed as

-[ RECORD 1 ]-+---------------------------
id            | 1
time          | 09:30
humanize_time | Early Morning - (9.30 am)
value         | 570
-[ RECORD 2 ]-+---------------------------
id            | 2
time          | 11:30
humanize_time | Late Morning - (11.30 am)
value         | 690
-[ RECORD 3 ]-+---------------------------
id            | 3
time          | 13:30
humanize_time | Early Afternoon - (1.30pm)
value         | 810
-[ RECORD 4 ]-+---------------------------
id            | 4
time          | 15:30
humanize_time | Late Afternoon - (3.30 pm)
value         | 930

API breaking implications

I don't see there being any.

Additional context

My naive suggestion would be to place a global option that turns this on, such as

pd.set_option('display.expanded_mode', True)

Under the hood, this can be as simple as melting the dataframe, and printing each record out with a separator between them. However I am not familiar with the intricacies of the display logic and leave this here for others' consideration.

@samzhang111 samzhang111 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 30, 2020
@jreback
Copy link
Contributor

jreback commented Dec 30, 2020

isn't it just this

In [168]: df = pd.DataFrame({'A': pd.date_range('20200101', periods=3), 'B': list('abc'), 'C': [1,2,3]})                                               

In [169]: df                                                                                                                                           
Out[169]: 
           A  B  C
0 2020-01-01  a  1
1 2020-01-02  b  2
2 2020-01-03  c  3

In [170]: df.stack()                                                                                                                                   
Out[170]: 
0  A    2020-01-01 00:00:00
   B                      a
   C                      1
1  A    2020-01-02 00:00:00
   B                      b
   C                      2
2  A    2020-01-03 00:00:00
   B                      c
   C                      3
dtype: object

@jreback
Copy link
Contributor

jreback commented Dec 30, 2020

output formatting is already quite complex. so we would need a really good reason.

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 30, 2020
@samzhang111
Copy link
Author

Thanks, stack is definitely a better description of what I want than melt.

The output of running stack is just a Series, so it has the same limitations as printing any Series directly to the console, which is that we may not even see an entire record before it is truncated, that the truncation can occur in the middle of a record, and that by default the individual entries are themselves given a fairly short default maximum length before they're truncated with ellipses.

Thus this proposal is really to add an output setting where dataframes are displayed in stacked format, but, say, showing at least K records (which is different than K rows of the series).

I can't claim this to be a "really" good reason, just that it's something that would be slick and convenient. It's one of my favorite features in the postgres console! I will play with my personal settings and just assume this isn't something that will likely be worked on, though. Thanks for the reply!

@mroeschke
Copy link
Member

Thanks for the report, but agreed since stack outputs a very similar result, I think this feature would be best suited for an external library to implement as pandas aims to have a limited amount of direct display APIs.

Closing, but happy to reopen if there is renewed interest from the other core devs and community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

3 participants