Skip to content

df.groupby('key').groups printed all: problem with large arrays #1135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Apr 26, 2012 · 7 comments · Fixed by #31388
Closed

df.groupby('key').groups printed all: problem with large arrays #1135

jorisvandenbossche opened this issue Apr 26, 2012 · 7 comments · Fixed by #31388
Labels
Enhancement Groupby Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 26, 2012

When you are working with a large array, it is not printed out in its entirety in the console, but when you have grouped them with groupby('key'), the groups are all printed out. Would it be possible to also restrict the output of groupby('key').groups to eg the first and last groups?

I was working with a rather large dataframe (around 80000 rows), and first it took a long time to print it all, and second the console got stuck for a while (but that could also be an issue with spyder).
I know it is not very useful to print out the groups, but I was curious to see what it would look like, but it was not a very good idea with such a large array.

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.randn(100000, 4), columns=list('abcd'))
>>> df['g'] = np.random.randint(0, 100, 100000)
>>> df.groupby('g').groups  # this repr should be truncated
@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

@jorisvandenbossche hah...this issue is from u 2 years ago!

a good idea!

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014
@jorisvandenbossche
Copy link
Member Author

ah yes :-) long time ago. I think my first one!

@benjaminarjun
Copy link
Contributor

I was hoping to take a shot at this. Does anyone have a recommended approach? groups gives a dict, so the dict repr is used. I'm not sure how to override that without defining a new class and having groups return that type instead.

@jreback
Copy link
Contributor

jreback commented Dec 17, 2018

i think u can use pprint: https://docs.python.org/3/library/pprint.html

use compact=True

@benjaminarjun
Copy link
Contributor

I found pandas.io.formats.printing.pprint_thing which does something similar to dataframe truncation:

>>> from pandas.io.formats.printing import pprint_thing
>>> z = [list('abc') for _ in range(100)]
>>> print(pprint_thing(z, max_seq_items=3))
[[a, b, c], [a, b, c], [a, b, c], ...]

However it truncates at the end rather than in the middle, which doesn't match df. Maybe pprint_thing could take another arg for where to truncate.

Also, groups currently returns a dict. I could subclass and override the __repr__ to accomplish this, but the return type of groups would have to change. Perhaps there's a cleaner trick I'm not aware of? 😄

@benjaminarjun
Copy link
Contributor

Would it be better to implement as I've described above and discuss further in the PR?

@benjaminarjun
Copy link
Contributor

benjaminarjun commented Dec 19, 2018

@jreback actually it appears #2628 is a duplicate. A similar approach has been suggested there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Groupby Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
5 participants