df.groupby('key').groups printed all: problem with large arrays #1135

jorisvandenbossche · 2012-04-26T07:52:16Z

When you are working with a large array, it is not printed out in its entirety in the console, but when you have grouped them with groupby('key'), the groups are all printed out. Would it be possible to also restrict the output of groupby('key').groups to eg the first and last groups?

I was working with a rather large dataframe (around 80000 rows), and first it took a long time to print it all, and second the console got stuck for a while (but that could also be an issue with spyder).
I know it is not very useful to print out the groups, but I was curious to see what it would look like, but it was not a very good idea with such a large array.

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.randn(100000, 4), columns=list('abcd'))
>>> df['g'] = np.random.randint(0, 100, 100000)
>>> df.groupby('g').groups  # this repr should be truncated

The text was updated successfully, but these errors were encountered:

jreback · 2014-02-18T22:15:09Z

@jorisvandenbossche hah...this issue is from u 2 years ago!

a good idea!

jorisvandenbossche · 2014-02-18T22:57:36Z

ah yes :-) long time ago. I think my first one!

benjaminarjun · 2018-12-17T00:15:52Z

I was hoping to take a shot at this. Does anyone have a recommended approach? groups gives a dict, so the dict repr is used. I'm not sure how to override that without defining a new class and having groups return that type instead.

jreback · 2018-12-17T00:22:16Z

i think u can use pprint: https://docs.python.org/3/library/pprint.html

use compact=True

benjaminarjun · 2018-12-17T03:42:56Z

I found pandas.io.formats.printing.pprint_thing which does something similar to dataframe truncation:

>>> from pandas.io.formats.printing import pprint_thing
>>> z = [list('abc') for _ in range(100)]
>>> print(pprint_thing(z, max_seq_items=3))
[[a, b, c], [a, b, c], [a, b, c], ...]

However it truncates at the end rather than in the middle, which doesn't match df. Maybe pprint_thing could take another arg for where to truncate.

Also, groups currently returns a dict. I could subclass and override the __repr__ to accomplish this, but the return type of groups would have to change. Perhaps there's a cleaner trick I'm not aware of? 😄

benjaminarjun · 2018-12-18T16:11:55Z

Would it be better to implement as I've described above and discuss further in the PR?

benjaminarjun · 2018-12-19T07:15:42Z

@jreback actually it appears #2628 is a duplicate. A similar approach has been suggested there.

jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014

jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015

TomAugspurger added Difficulty Intermediate labels Jul 8, 2017

benjaminarjun mentioned this issue Jan 21, 2019

Feature/groupby repr ellipses 1135 #24853

Closed

4 tasks

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

MarcoGorelli mentioned this issue Jan 28, 2020

ENH: truncate output of Groupby.groups #31388

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.1 Jan 31, 2020

jreback closed this as completed in #31388 Jan 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.groupby('key').groups printed all: problem with large arrays #1135

df.groupby('key').groups printed all: problem with large arrays #1135

jorisvandenbossche commented Apr 26, 2012 •

edited by TomAugspurger

Loading

jreback commented Feb 18, 2014

jorisvandenbossche commented Feb 18, 2014

benjaminarjun commented Dec 17, 2018

jreback commented Dec 17, 2018

benjaminarjun commented Dec 17, 2018

benjaminarjun commented Dec 18, 2018

benjaminarjun commented Dec 19, 2018 •

edited

Loading

df.groupby('key').groups printed all: problem with large arrays #1135

df.groupby('key').groups printed all: problem with large arrays #1135

Comments

jorisvandenbossche commented Apr 26, 2012 • edited by TomAugspurger Loading

jreback commented Feb 18, 2014

jorisvandenbossche commented Feb 18, 2014

benjaminarjun commented Dec 17, 2018

jreback commented Dec 17, 2018

benjaminarjun commented Dec 17, 2018

benjaminarjun commented Dec 18, 2018

benjaminarjun commented Dec 19, 2018 • edited Loading

jorisvandenbossche commented Apr 26, 2012 •

edited by TomAugspurger

Loading

benjaminarjun commented Dec 19, 2018 •

edited

Loading