Feature/groupby repr ellipses 1135 #24853

benjaminarjun · 2019-01-21T00:17:27Z

closes df.groupby('key').groups printed all: problem with large arrays #1135
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Currently one test is failing:

    def test_groups(self, df):
        grouped = df.groupby(['A'])
        groups = grouped.groups
        assert groups is grouped.groups  # caching works

I'm not sure what exactly this test is checking for. Is this a behavior that needs to be kept?

pandas/io/formats/printing.py

pandas/core/groupby/groupby.py

…f. Make tests more general

codecov · 2019-01-21T02:19:22Z

Codecov Report

Merging #24853 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24853      +/-   ##
==========================================
- Coverage   91.98%   91.97%   -0.01%     
==========================================
  Files         175      175              
  Lines       52372    52375       +3     
==========================================
- Hits        48172    48171       -1     
- Misses       4200     4204       +4

Flag	Coverage Δ
#multiple	`90.52% <100%> (ø)`	⬆️
#single	`40.7% <75%> (-0.14%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby/groupby.py	`97.23% <ø> (ø)`	⬆️
pandas/io/formats/printing.py	`86.09% <ø> (+0.53%)`	⬆️
pandas/core/frame.py	`96.9% <ø> (-0.12%)`	⬇️
pandas/core/groupby/grouper.py	`98.18% <ø> (ø)`	⬆️
pandas/core/groupby/ops.py	`95.97% <ø> (ø)`	⬆️
pandas/core/indexes/base.py	`96.94% <100%> (ø)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/util/testing.py	`90.61% <0%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48ea04f...9742473. Read the comment docs.

codecov · 2019-01-21T02:19:23Z

Codecov Report

Merging #24853 into master will decrease coverage by 49.49%.
The diff coverage is 23.07%.

@@            Coverage Diff             @@
##           master   #24853      +/-   ##
==========================================
- Coverage   92.39%   42.89%   -49.5%     
==========================================
  Files         166      166              
  Lines       52391    52398       +7     
==========================================
- Hits        48407    22477   -25930     
- Misses       3984    29921   +25937

Flag	Coverage Δ
#multiple	`?`
#single	`42.89% <23.07%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`35.84% <ø> (-61.08%)`	⬇️
pandas/io/formats/printing.py	`65.15% <10%> (-20.27%)`	⬇️
pandas/core/groupby/groupby.py	`24.62% <66.66%> (-72.19%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.35%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
... and 126 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01e7872...d6b310a. Read the comment docs.

WillAyd · 2019-02-06T04:03:58Z

@benarthur91 can you check failures on this one?

benjaminarjun · 2019-02-06T05:35:23Z

Pushing w/ failing test commented out - I don't understand the issue well, and want to confirm this test is the reason for red from Codecov before addressing.

benjaminarjun · 2019-02-06T05:37:31Z

pandas/tests/groupby/test_grouping.py

-    def test_groups(self, df):
-        grouped = df.groupby(['A'])
-        groups = grouped.groups
-        assert groups is grouped.groups  # caching works


Failing on this line - I'm wondering what the value of this behavior is and/or whether there's interest in retaining it?

Hmm I would think so; so the current approach is instantiating a new class on every access of .groups? That seems potentially expensive and counter-intuitive.

Is there a way to get the intended behavior without a new class?

Not that I'm aware of. groupsis currently a standard dict, whose __repr__ isn't abbreviated, even for large instances. Seems you'd have to override the __repr__ to get this behavior, and to do that you'd have to subclass dict. Maybe there's a better way I haven't thought of.

In response to instantiating a new class on every access, I could look into storing groups on the GroupBy object as an instance of the new class rather than a plain dict. Then .groups would just get the attribute rather than creating a new object every time it's called. I think that would resolve this case.

WillAyd · 2019-02-12T05:31:10Z

@benarthur91 any chance you can merge master and refactor to go another route? I'm personally -1 on current implementation due to instantiation of new object on property access as I think that's just a confusing API

benjaminarjun · 2019-02-12T05:59:49Z

@WillAyd Definitely. Did you have a particular alternative in mind? I was thinking to make the underlying object an instance of the class, but if you have a thought I'd love to hear it.

WillAyd · 2019-02-12T06:04:03Z

Not particularly

…

On Feb 11, 2019, at 9:59 PM, benarthur91 ***@***.***> wrote: @WillAyd <https://github.com/WillAyd> Definitely. Did you have a particular alternative in mind? I was thinking to make the underlying object an instance of the class, but if you have a thought I'd love to hear it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24853 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlOUUdFj7iNYnL3C4EcXsuiqUf9ab4uks5vMlhYgaJpZM4aJ7ao>.

pandas/core/indexes/base.py

jreback

can you also add a whatsnew note (other api changes is ok) in 0.25

jreback · 2019-03-10T22:35:36Z

pandas/core/indexes/base.py

@@ -5290,6 +5291,27 @@ def _add_logical_methods_disabled(cls):
 Index._add_comparison_methods()


+class IndexGroupbyGroups(dict):
+    def __repr__(self):


this just need to call pandas.io.formats.printing.pprint_thing (with the option passed)

jreback · 2019-03-10T22:35:50Z

pandas/core/indexes/base.py

@@ -5290,6 +5291,27 @@ def _add_logical_methods_disabled(cls):
 Index._add_comparison_methods()


+class IndexGroupbyGroups(dict):
+    def __repr__(self):


give a class doc-string here

…a couple typo fixes

benjaminarjun · 2019-03-15T06:12:11Z

@jreback I've updated to use pprint_thing. I believe the desired output is something like

{1: Int64Index([0, 1, 2], dtype='int64'), ... , 3: Int64Index([5], dtype='int64')}

but with the use of this function I get:

{1: [0, 1, ...], 2: [3, 4], ...}

However this abbreviates at the end of the object rather than the middle (not consistent with DataFrame's repr) and the pprint behavior is applied recursively, so reprs of the object's values are oversimplified. I attempted to address these issues on commit 19bb9bf, however this required changes to the pprint_thing definition.

jreback · 2019-03-20T01:26:58Z

can you merge master and see if you can get this passing

…all update in doc.

This reverts commit 9621669.

jreback · 2019-04-28T21:06:06Z

pandas/core/indexes/base.py

@@ -5274,6 +5276,14 @@ def _add_logical_methods_disabled(cls):
 Index._add_comparison_methods()


+class IndexGroupbyGroups(dict):
+    """Dict extension to support abbreviated __repr__"""
+    from pandas.io.formats.printing import pprint_thing


can this be imported at the top?

jreback · 2019-04-28T21:09:27Z

pandas/core/indexes/base.py

@@ -5274,6 +5276,14 @@ def _add_logical_methods_disabled(cls):
 Index._add_comparison_methods()


+class IndexGroupbyGroups(dict):


this should be defined in pandas/core/groupby/groupby.py and used on the dict returning functions, mainly groups and indices

this is not user facing function and kind of obscures that this is from groupby

actually, maybe best to actually put this in pandas.io.format.printing and name it PrettyDict(dict), see my other notes for where to actually use this.

Sure, I'll move this. To your first point, are you saying a groups property should do something like this: return PrettyDict(self.grouper.groups)? @WillAyd mentioned had noted that he'd prefer to put this in a different place, as multiple calls to the method will create a new object every time.

yes; thiat would be a centralized place

The topic initially came up because instantiating in the groups method was failing this test:

def test_groups(self, df): grouped = df.groupby(['A']) groups = grouped.groups assert groups is grouped.groups # caching works

@WillAyd Thoughts?

What @jreback says here makes a lot of sense - go ahead with the move!

Thanks, I'll remove this test as part of the next commit.

jreback · 2019-04-28T21:10:08Z

pandas/tests/io/formats/test_format.py

+            'b': [1, 2, 3, 4, 5, 6]
+        })
+
+        with option_context('display.max_rows', 2):


can you also try a grouper like np.array(df.a) which hits a different path

jreback · 2019-04-28T21:10:48Z

pandas/tests/io/formats/test_format.py

@@ -1761,6 +1761,22 @@ def test_period(self):
        assert str(df) == exp


+class TestDataFrameGroupByFormatting(object):


this goes in pandas/tests/groupby/test_grouping.py near the other repr tests

jreback · 2019-04-28T21:11:27Z

doc/source/whatsnew/v0.25.0.rst

 - The ``arg`` argument in :meth:`pandas.core.groupby.DataFrameGroupBy.agg` has been renamed to ``func`` (:issue:`26089`)
+- :meth:`Index.groupby` and dependent methods (notably :attr:`GroupBy.groups`) now return object with abbreviated repr (:issue:`1135`)


this is not a user facing message; Index.groupby is not really a public method, while Groupby.groups is the user facing; pls make this a bit more clear.

jreback · 2019-04-28T21:13:40Z

pandas/core/indexes/base.py

@@ -5274,6 +5276,14 @@ def _add_logical_methods_disabled(cls):
 Index._add_comparison_methods()


+class IndexGroupbyGroups(dict):


actually, maybe best to actually put this in pandas.io.format.printing and name it PrettyDict(dict), see my other notes for where to actually use this.

jreback · 2019-06-08T20:27:51Z

can you merge master and update

jreback · 2019-07-11T16:09:19Z

closing as stale, but this is pretty close if you or @pandas-dev/pandas-core would like to finish up.

benjaminarjun added 2 commits January 20, 2019 16:02

Add truncatable repr for DF groupby groups

f44e671

Merge branch 'master' into feature/groupby-repr-ellipses-1135

19bb9bf

jreback requested changes Jan 21, 2019

View reviewed changes

pandas/io/formats/printing.py Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

Roll back added params to __pprint_dict. All logic now in __repr__ de…

d6b310a

…f. Make tests more general

Remove unused line of code

43dbc6b

WillAyd added Groupby Output-Formatting __repr__ of pandas objects, to_string labels Jan 21, 2019

Merge branch 'master' into feature/groupby-repr-ellipses-1135

49f1def

benjaminarjun added 2 commits February 5, 2019 20:52

Merge branch 'master' into feature/groupby-repr-ellipses-1135

85d3012

Temporarily disabling failing test

0746c3b

benjaminarjun commented Feb 6, 2019

View reviewed changes

benjaminarjun added 5 commits February 26, 2019 18:04

Merge branch 'master' into feature/groupby-repr-ellipses-1135

6a7d7df

Merge branch 'master' into feature/groupby-repr-ellipses-1135

3d4b057

Move truncated dict repr to Index.groupby()

33142cb

Merge branch 'master' into feature/groupby-repr-ellipses-1135

dbb7d12

Add correct groups object

5db6c07

WillAyd reviewed Mar 6, 2019

View reviewed changes

pandas/core/indexes/base.py Outdated Show resolved Hide resolved

benjaminarjun added 2 commits March 6, 2019 19:59

A few misc items for the linter

8f30d07

Merge branch 'master' into feature/groupby-repr-ellipses-1135

2870163

jreback requested changes Mar 10, 2019

View reviewed changes

benjaminarjun added 2 commits March 14, 2019 21:24

Merge branch 'master' into feature/groupby-repr-ellipses-1135

acfa005

Use pprint_thing in IndexGroupByGroups. Add whatsnew, docstring, and …

b60329c

…a couple typo fixes

benjaminarjun added 8 commits March 28, 2019 19:14

Merge branch 'master' into feature/groupby-repr-ellipses-1135

13b73a6

Update tests to expect pprint formatting. Use new config location. Sm…

29c6263

…all update in doc.

Merge branch 'master' into feature/groupby-repr-ellipses-1135

ccb98a3

Accept isort formatting preference

c74cbba

Merge branch 'master' into feature/groupby-repr-ellipses-1135

cdb9ebc

Add nonsense to AUTHORS.md

9621669

Revert "Add nonsense to AUTHORS.md"

38ecd1a

This reverts commit 9621669.

Merge branch 'master' into feature/groupby-repr-ellipses-1135

9742473

jreback requested changes Apr 28, 2019

View reviewed changes

jreback closed this Jul 11, 2019

MarcoGorelli mentioned this pull request Jan 28, 2020

ENH: truncate output of Groupby.groups #31388

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/groupby repr ellipses 1135 #24853

Feature/groupby repr ellipses 1135 #24853

benjaminarjun commented Jan 21, 2019

codecov bot commented Jan 21, 2019 •

edited

Loading

codecov bot commented Jan 21, 2019

WillAyd commented Feb 6, 2019

benjaminarjun commented Feb 6, 2019 •

edited

Loading

benjaminarjun Feb 6, 2019

WillAyd Feb 6, 2019

benjaminarjun Feb 6, 2019

WillAyd commented Feb 12, 2019

benjaminarjun commented Feb 12, 2019

WillAyd commented Feb 12, 2019 via email

jreback left a comment

jreback Mar 10, 2019

jreback Mar 10, 2019

benjaminarjun commented Mar 15, 2019

jreback commented Mar 20, 2019

jreback Apr 28, 2019

jreback Apr 28, 2019

jreback Apr 28, 2019

benjaminarjun Apr 30, 2019

jreback Apr 30, 2019

benjaminarjun May 1, 2019

WillAyd May 1, 2019

benjaminarjun May 1, 2019

jreback Apr 28, 2019

jreback Apr 28, 2019

jreback Apr 28, 2019

jreback Apr 28, 2019

jreback commented Jun 8, 2019

jreback commented Jul 11, 2019

		@@ -5274,6 +5276,14 @@ def _add_logical_methods_disabled(cls):
		Index._add_comparison_methods()


		class IndexGroupbyGroups(dict):

		@@ -1761,6 +1761,22 @@ def test_period(self):
		assert str(df) == exp


		class TestDataFrameGroupByFormatting(object):

		- The ``arg`` argument in :meth:`pandas.core.groupby.DataFrameGroupBy.agg` has been renamed to ``func`` (:issue:`26089`)
		- :meth:`Index.groupby` and dependent methods (notably :attr:`GroupBy.groups`) now return object with abbreviated repr (:issue:`1135`)

Feature/groupby repr ellipses 1135 #24853

Feature/groupby repr ellipses 1135 #24853

Conversation

benjaminarjun commented Jan 21, 2019

codecov bot commented Jan 21, 2019 • edited Loading

Codecov Report

codecov bot commented Jan 21, 2019

Codecov Report

WillAyd commented Feb 6, 2019

benjaminarjun commented Feb 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Feb 12, 2019

benjaminarjun commented Feb 12, 2019

WillAyd commented Feb 12, 2019 via email

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjaminarjun commented Mar 15, 2019

jreback commented Mar 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 8, 2019

jreback commented Jul 11, 2019

codecov bot commented Jan 21, 2019 •

edited

Loading

benjaminarjun commented Feb 6, 2019 •

edited

Loading