Skip to content

Added repr string for Grouper and TimeGrouper #18203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 27, 2017

Conversation

topper-123
Copy link
Contributor

  • closes #xxxx
  • tests added / passed
  • [x ] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This is a new implementation of #17727. This is much simpler than the previous attempts. The outputs are now:

>>> pd.Grouper(key='A')
Grouper(key='A')
>>> pd.Grouper(key='A', freq='2T')
TimeGrouper(key='A', freq=<2 * Minutes>, sort=True, closed='left', label='left', convention='e')

Tests still need to be written. I can do them when/if this is ok.

@jreback jreback mentioned this pull request Nov 9, 2017
2 tasks
@@ -233,6 +233,8 @@ class Grouper(object):

>>> df.groupby(Grouper(level='date', freq='60s', axis=1))
"""
_attributes = (('key', None), ('level', None), ('freq', None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again you don't need to specify the default, just list the attributes. The repr should include all the values, not just the defaults. Further this means you are defining defaults in 2 places, here and the init, not a good idea.

Copy link
Contributor Author

@topper-123 topper-123 Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed. Also added test for TimeGrouper.__repr__. Not sure where a test for Grouper.__repr__ should go.

Should a note about this go in the 0.21.1 or 0.22 whatsnew.rst?

@jreback jreback added Groupby Output-Formatting __repr__ of pandas objects, to_string Resample resample method labels Nov 9, 2017
@pep8speaks
Copy link

pep8speaks commented Nov 9, 2017

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 27, 2017 at 01:12 Hours UTC

@topper-123 topper-123 force-pushed the Grouper_repr_II branch 3 times, most recently from a603bee to df67063 Compare November 9, 2017 23:16
@topper-123
Copy link
Contributor Author

topper-123 commented Nov 9, 2017

After changes according to @jreback, the output is now:

>>> pd.Grouper(key='A')
Grouper(key='A', level=None, freq=None, axis=0, sort=False)
>>> pd.Grouper(key='A', freq='H')
TimeGrouper(key='A', level=None, freq=<Hour>, sort=True, closed='left', label='left', how='mean', nperiods=None, axis=0, fill_method=None, limit=None, loffset=None, kind=None, convention='e', base=0)

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

so what you can do is skip printing attributes that are None
makes it’s shorter w/o loss of generality

@@ -333,6 +334,15 @@ def _set_grouper(self, obj, sort=False):
def groups(self):
return self.grouper.groups

def __repr__(self):
attrs_list = []
for attr_name in self._attributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be a list comprehension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -1025,6 +1025,9 @@ class TimeGrouper(Grouper):
Use begin, end, nperiods to generate intervals that cannot be derived
directly from the associated object
"""
_attributes = ('key', 'level', 'freq', 'sort', 'closed', 'label',
'how', 'nperiods', 'axis', 'fill_method', 'limit',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only should have public attributes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iow you don’t need any special ones for TimeGrouper (which is publicly deprecated anyhow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TimeGrouper object is returned when calling Grouper with a freq arg, IOW it's only the top level location of TimeGrouper that is deprecated, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes however you will never actually directly construct one
point being is the args are the same as Grouper (maybe a couple of extra which are prob not documented)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add the missing arguments in the doc string of Grouper? eg convention, base, loffset i think are only public ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course. Changed.

Copy link
Contributor Author

@topper-123 topper-123 Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering your last comment, it's actually possible to to:

>>> pd.Grouper(key='A', freq='H', convention='s')
TimeGrouper(key='A', freq=<Hour>, axis=0, sort=True)

Note the output now misses convention='s', even though it's been set.

Shouldn't TimeGrouper actually have its own _attributes to account for this issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it should take a copy of the super (Grouper) and add to it

Copy link
Contributor Author

@topper-123 topper-123 Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added _attributes to TimeGrouper.

I don't know what base, loffset and convention do, and can't quite figure it out from the code, so can't add explanations to the doc string, sorry. I added them, but without explanations. Do you have a proposal explanation for each?

@topper-123 topper-123 force-pushed the Grouper_repr_II branch 2 times, most recently from 3add483 to 74aa59d Compare November 10, 2017 00:23
expected = ("TimeGrouper(key='A', freq=<Hour>, sort=True, "
"closed='left', label='left', how='mean', axis=0, "
"convention='e', base=0)")
assert result == expected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better test is to actually construct the object and compare to the original

Copy link
Contributor Author

@topper-123 topper-123 Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand you here. Do you mean compare the repr of Grouper and TimeGrouper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, use the repr to actually construct a new object (that's the point of a repr), and compare to the original

Copy link
Contributor Author

@topper-123 topper-123 Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand. Should the expected be wrapped in a repr()?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comments

@@ -22,7 +22,7 @@ Other Enhancements
^^^^^^^^^^^^^^^^^^

- :meth:`Timestamp.timestamp` is now available in Python 2.7. (:issue:`17329`)
-
- :class:`Grouper` and :class:`TimeGrouper` now have a friendly repr output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed.

@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

can you rebase

@codecov
Copy link

codecov bot commented Nov 26, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@38f41e6). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18203   +/-   ##
=========================================
  Coverage          ?   91.38%           
=========================================
  Files             ?      163           
  Lines             ?    50066           
  Branches          ?        0           
=========================================
  Hits              ?    45753           
  Misses            ?     4313           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.19% <100%> (?)
#single 40.36% <100%> (?)
Impacted Files Coverage Δ
pandas/core/groupby.py 92.03% <100%> (ø)
pandas/core/resample.py 96.16% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38f41e6...833883f. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 26, 2017

Codecov Report

Merging #18203 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18203      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         163      163              
  Lines       49796    49797       +1     
==========================================
- Hits        45490    45482       -8     
- Misses       4306     4315       +9
Flag Coverage Δ
#multiple 89.13% <100%> (ø) ⬆️
#single 40.79% <100%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.03% <100%> (ø) ⬆️
pandas/core/resample.py 96.34% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f745e52...bdc5ac2. Read the comment docs.

@@ -1025,7 +1025,10 @@ class TimeGrouper(Grouper):
Use begin, end, nperiods to generate intervals that cannot be derived
directly from the associated object
"""

_attributes = Grouper._attributes + ('closed', 'label', 'how',
'nperiods', 'fill_method', 'limit',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you leave out nperiods, also fill_method and limit which are deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can actually completely remove nperiods as this is non-functional.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some comments.

@@ -3416,3 +3416,10 @@ def test_aggregate_with_nat(self):

# if NaT is included, 'var', 'std', 'mean', 'first','last'
# and 'nth' doesn't work yet

def test_repr(self):
result = repr(TimeGrouper(key='A', freq='H'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number

def test_repr(self):
result = repr(TimeGrouper(key='A', freq='H'))
expected = ("TimeGrouper(key='A', freq=<Hour>, axis=0, sort=True, "
"closed='left', label='left', how='mean', "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a similar test for Grouper (in test_groupby)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@topper-123
Copy link
Contributor Author

green.

@jreback jreback modified the milestone: 0.22.0 Nov 26, 2017
@jreback jreback added this to the 0.21.1 milestone Nov 26, 2017
@jreback
Copy link
Contributor

jreback commented Nov 26, 2017

linting issue. otherwise lgtm. note that you can do

make lint-diff to see linting in your patch, locally

@topper-123 topper-123 force-pushed the Grouper_repr_II branch 2 times, most recently from 512c8e2 to 1d4e3d9 Compare November 26, 2017 20:16
@@ -1025,6 +1025,10 @@ class TimeGrouper(Grouper):
Use begin, end, nperiods to generate intervals that cannot be derived
directly from the associated object
"""
_attributes = Grouper._attributes + ('closed', 'label', 'how',
'nperiods', 'fill_method', 'limit',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove nperiods, fill_method, and limit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remove nperiods from the signature entirely, its is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but what's the benefit of not having fill_method and limit in _attributes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they are deprecated (see the maybe_process_deprecations)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there's this

    Notes
    -----
    Use begin, end, nperiods to generate intervals that cannot be derived
    directly from the associated object

None of begin or end seem to exist, and if nperiods isn't used, I've just removed that note

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah remove that, looks bogus

@topper-123
Copy link
Contributor Author

All green, @jreback

@jreback jreback merged commit f7c79be into pandas-dev:master Nov 27, 2017
@jreback
Copy link
Contributor

jreback commented Nov 27, 2017

thanks!

@topper-123 topper-123 deleted the Grouper_repr_II branch November 27, 2017 13:52
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Output-Formatting __repr__ of pandas objects, to_string Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants