Skip to content

Feature/groupby repr ellipses 1135 #24853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
f44e671
Add truncatable repr for DF groupby groups
benjaminarjun Jan 21, 2019
19bb9bf
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Jan 21, 2019
d6b310a
Roll back added params to __pprint_dict. All logic now in __repr__ de…
benjaminarjun Jan 21, 2019
43dbc6b
Remove unused line of code
benjaminarjun Jan 21, 2019
49f1def
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Jan 23, 2019
85d3012
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Feb 6, 2019
0746c3b
Temporarily disabling failing test
benjaminarjun Feb 6, 2019
6a7d7df
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Feb 27, 2019
3d4b057
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Mar 5, 2019
33142cb
Move truncated dict repr to Index.groupby()
benjaminarjun Mar 6, 2019
dbb7d12
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Mar 6, 2019
5db6c07
Add correct groups object
benjaminarjun Mar 6, 2019
8f30d07
A few misc items for the linter
benjaminarjun Mar 7, 2019
2870163
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Mar 7, 2019
acfa005
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Mar 15, 2019
b60329c
Use pprint_thing in IndexGroupByGroups. Add whatsnew, docstring, and …
benjaminarjun Mar 15, 2019
13b73a6
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Mar 29, 2019
29c6263
Update tests to expect pprint formatting. Use new config location. Sm…
benjaminarjun Mar 30, 2019
ccb98a3
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Mar 30, 2019
c74cbba
Accept isort formatting preference
benjaminarjun Mar 30, 2019
cdb9ebc
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Apr 10, 2019
9621669
Add nonsense to AUTHORS.md
benjaminarjun Apr 10, 2019
38ecd1a
Revert "Add nonsense to AUTHORS.md"
benjaminarjun Apr 10, 2019
9742473
Merge branch 'master' into feature/groupby-repr-ellipses-1135
benjaminarjun Apr 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -554,7 +554,7 @@ def _repr_fits_horizontal_(self, ignore_width=False):
Check if full repr fits in horizontal boundaries imposed by the display
options width and max_columns.

In case off non-interactive session, no boundaries apply.
In case of non-interactive session, no boundaries apply.

`ignore_width` is here so ipnb+HTML output can behave the way
users expect. display.max_columns remains in effect.
Expand Down
1 change: 0 additions & 1 deletion pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,6 @@ class Grouping(object):

def __init__(self, index, grouper=None, obj=None, name=None, level=None,
sort=True, observed=False, in_axis=False):

self.name = name
self.level = level
self.grouper = _convert_grouper(index, grouper)
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/groupby/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ def size(self):
@cache_readonly
def groups(self):
""" dict {group name -> group labels} """

if len(self.groupings) == 1:
return self.groupings[0].groups
else:
Expand Down Expand Up @@ -361,7 +362,7 @@ def get_group_levels(self):

def _is_builtin_func(self, arg):
"""
if we define an builtin function for this argument, return it,
if we define a builtin function for this argument, return it,
otherwise return the arg
"""
return SelectionMixin._builtin_table.get(arg, arg)
Expand Down
24 changes: 23 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
from pandas.core.arrays import ExtensionArray
from pandas.core.base import IndexOpsMixin, PandasObject
import pandas.core.common as com
from pandas.core.config import get_option
from pandas.core.indexes.frozen import FrozenList
import pandas.core.missing as missing
from pandas.core.ops import get_op_result_name, make_invalid_op
Expand Down Expand Up @@ -4493,7 +4494,7 @@ def groupby(self, values):
# map to the label
result = {k: self.take(v) for k, v in compat.iteritems(result)}

return result
return IndexGroupbyGroups(result)

def map(self, mapper, na_action=None):
"""
Expand Down Expand Up @@ -5290,6 +5291,27 @@ def _add_logical_methods_disabled(cls):
Index._add_comparison_methods()


class IndexGroupbyGroups(dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be defined in pandas/core/groupby/groupby.py and used on the dict returning functions, mainly groups and indices

this is not user facing function and kind of obscures that this is from groupby

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, maybe best to actually put this in pandas.io.format.printing and name it PrettyDict(dict), see my other notes for where to actually use this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll move this. To your first point, are you saying a groups property should do something like this: return PrettyDict(self.grouper.groups)? @WillAyd mentioned had noted that he'd prefer to put this in a different place, as multiple calls to the method will create a new object every time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes; thiat would be a centralized place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topic initially came up because instantiating in the groups method was failing this test:

def test_groups(self, df):
        grouped = df.groupby(['A'])
        groups = grouped.groups
        assert groups is grouped.groups  # caching works

@WillAyd Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @jreback says here makes a lot of sense - go ahead with the move!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll remove this test as part of the next commit.

def __repr__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just need to call pandas.io.formats.printing.pprint_thing (with the option passed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give a class doc-string here

nitems = get_option('display.max_rows') or len(self)

fmt = u("{{{things}}}")
pfmt = u("{key}: {val}")

pairs = []
for k, v in list(self.items()):
pairs.append(pfmt.format(key=k, val=v))

if nitems < len(self):
print("Truncating repr")
start_cnt, end_cnt = nitems - int(nitems / 2), int(nitems / 2)
return fmt.format(things=", ".join(pairs[:start_cnt]) +
", ... , " +
", ".join(pairs[-end_cnt:]))
else:
return fmt.format(things=", ".join(pairs))


def ensure_index_from_sequences(sequences, names=None):
"""
Construct an index from sequences of data.
Expand Down
5 changes: 3 additions & 2 deletions pandas/io/formats/printing.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def _join_unicode(lines, sep=''):
def _pprint_seq(seq, _nest_lvl=0, max_seq_items=None, **kwds):
"""
internal. pprinter for iterables. you should probably use pprint_thing()
rather then calling this directly.
rather than calling this directly.

bounds length of printed sequence, depending on options
"""
Expand Down Expand Up @@ -127,8 +127,9 @@ def _pprint_seq(seq, _nest_lvl=0, max_seq_items=None, **kwds):
def _pprint_dict(seq, _nest_lvl=0, max_seq_items=None, **kwds):
"""
internal. pprinter for iterables. you should probably use pprint_thing()
rather then calling this directly.
rather than calling this directly.
"""

fmt = u("{{{things}}}")
pairs = []

Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/io/formats/test_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -1797,6 +1797,22 @@ def test_period(self):
assert str(df) == exp


class TestDataFrameGroupByFormatting(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this goes in pandas/tests/groupby/test_grouping.py near the other repr tests

def test_groups_repr_truncates(self):
df = pd.DataFrame({
'a': [1, 1, 1, 2, 2, 3],
'b': [1, 2, 3, 4, 5, 6]
})

with option_context('display.max_rows', 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also try a grouper like np.array(df.a) which hits a different path

x = df.groupby('a').groups
assert ', ... ,' in x.__repr__()

with option_context('display.max_rows', 5):
x = df.groupby('a').groups
assert ', ... ,' not in x.__repr__()


def gen_series_formatting():
s1 = pd.Series(['a'] * 100)
s2 = pd.Series(['ab'] * 100)
Expand Down