-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Feature/groupby repr ellipses 1135 #24853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/groupby repr ellipses 1135 #24853
Conversation
…f. Make tests more general
Codecov Report
@@ Coverage Diff @@
## master #24853 +/- ##
==========================================
- Coverage 91.98% 91.97% -0.01%
==========================================
Files 175 175
Lines 52372 52375 +3
==========================================
- Hits 48172 48171 -1
- Misses 4200 4204 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24853 +/- ##
==========================================
- Coverage 92.39% 42.89% -49.5%
==========================================
Files 166 166
Lines 52391 52398 +7
==========================================
- Hits 48407 22477 -25930
- Misses 3984 29921 +25937
Continue to review full report at Codecov.
|
@benarthur91 can you check failures on this one? |
Pushing w/ failing test commented out - I don't understand the issue well, and want to confirm this test is the reason for red from Codecov before addressing. |
def test_groups(self, df): | ||
grouped = df.groupby(['A']) | ||
groups = grouped.groups | ||
assert groups is grouped.groups # caching works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing on this line - I'm wondering what the value of this behavior is and/or whether there's interest in retaining it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I would think so; so the current approach is instantiating a new class on every access of .groups
? That seems potentially expensive and counter-intuitive.
Is there a way to get the intended behavior without a new class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm aware of. groups
is currently a standard dict, whose __repr__
isn't abbreviated, even for large instances. Seems you'd have to override the __repr__
to get this behavior, and to do that you'd have to subclass dict
. Maybe there's a better way I haven't thought of.
In response to instantiating a new class on every access, I could look into storing groups on the GroupBy object as an instance of the new class rather than a plain dict. Then .groups
would just get the attribute rather than creating a new object every time it's called. I think that would resolve this case.
@benarthur91 any chance you can merge master and refactor to go another route? I'm personally -1 on current implementation due to instantiation of new object on property access as I think that's just a confusing API |
@WillAyd Definitely. Did you have a particular alternative in mind? I was thinking to make the underlying object an instance of the class, but if you have a thought I'd love to hear it. |
Not particularly
… On Feb 11, 2019, at 9:59 PM, benarthur91 ***@***.***> wrote:
@WillAyd <https://github.com/WillAyd> Definitely. Did you have a particular alternative in mind? I was thinking to make the underlying object an instance of the class, but if you have a thought I'd love to hear it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#24853 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlOUUdFj7iNYnL3C4EcXsuiqUf9ab4uks5vMlhYgaJpZM4aJ7ao>.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a whatsnew note (other api changes is ok) in 0.25
pandas/core/indexes/base.py
Outdated
@@ -5290,6 +5291,27 @@ def _add_logical_methods_disabled(cls): | |||
Index._add_comparison_methods() | |||
|
|||
|
|||
class IndexGroupbyGroups(dict): | |||
def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this just need to call pandas.io.formats.printing.pprint_thing (with the option passed)
pandas/core/indexes/base.py
Outdated
@@ -5290,6 +5291,27 @@ def _add_logical_methods_disabled(cls): | |||
Index._add_comparison_methods() | |||
|
|||
|
|||
class IndexGroupbyGroups(dict): | |||
def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give a class doc-string here
@jreback I've updated to use pprint_thing. I believe the desired output is something like
but with the use of this function I get:
However this abbreviates at the end of the object rather than the middle (not consistent with DataFrame's repr) and the pprint behavior is applied recursively, so reprs of the object's values are oversimplified. I attempted to address these issues on commit 19bb9bf, however this required changes to the pprint_thing definition. |
can you merge master and see if you can get this passing |
…all update in doc.
This reverts commit 9621669.
@@ -5274,6 +5276,14 @@ def _add_logical_methods_disabled(cls): | |||
Index._add_comparison_methods() | |||
|
|||
|
|||
class IndexGroupbyGroups(dict): | |||
"""Dict extension to support abbreviated __repr__""" | |||
from pandas.io.formats.printing import pprint_thing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be imported at the top?
@@ -5274,6 +5276,14 @@ def _add_logical_methods_disabled(cls): | |||
Index._add_comparison_methods() | |||
|
|||
|
|||
class IndexGroupbyGroups(dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be defined in pandas/core/groupby/groupby.py and used on the dict returning functions, mainly groups
and indices
this is not user facing function and kind of obscures that this is from groupby
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, maybe best to actually put this in pandas.io.format.printing and name it PrettyDict(dict), see my other notes for where to actually use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll move this. To your first point, are you saying a groups property should do something like this: return PrettyDict(self.grouper.groups)
? @WillAyd mentioned had noted that he'd prefer to put this in a different place, as multiple calls to the method will create a new object every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes; thiat would be a centralized place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topic initially came up because instantiating in the groups
method was failing this test:
def test_groups(self, df):
grouped = df.groupby(['A'])
groups = grouped.groups
assert groups is grouped.groups # caching works
@WillAyd Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @jreback says here makes a lot of sense - go ahead with the move!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll remove this test as part of the next commit.
'b': [1, 2, 3, 4, 5, 6] | ||
}) | ||
|
||
with option_context('display.max_rows', 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also try a grouper like np.array(df.a)
which hits a different path
@@ -1761,6 +1761,22 @@ def test_period(self): | |||
assert str(df) == exp | |||
|
|||
|
|||
class TestDataFrameGroupByFormatting(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this goes in pandas/tests/groupby/test_grouping.py near the other repr tests
- The ``arg`` argument in :meth:`pandas.core.groupby.DataFrameGroupBy.agg` has been renamed to ``func`` (:issue:`26089`) | ||
- :meth:`Index.groupby` and dependent methods (notably :attr:`GroupBy.groups`) now return object with abbreviated repr (:issue:`1135`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a user facing message; Index.groupby is not really a public method, while Groupby.groups is the user facing; pls make this a bit more clear.
@@ -5274,6 +5276,14 @@ def _add_logical_methods_disabled(cls): | |||
Index._add_comparison_methods() | |||
|
|||
|
|||
class IndexGroupbyGroups(dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, maybe best to actually put this in pandas.io.format.printing and name it PrettyDict(dict), see my other notes for where to actually use this.
can you merge master and update |
closing as stale, but this is pretty close if you or @pandas-dev/pandas-core would like to finish up. |
git diff upstream/master -u -- "*.py" | flake8 --diff
Currently one test is failing:
I'm not sure what exactly this test is checking for. Is this a behavior that needs to be kept?