-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: add GroupBy.pipe method #17871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add GroupBy.pipe method #17871
Conversation
98949b5
to
9173deb
Compare
doc/source/groupby.rst
Outdated
allow for a cleaner, more readable syntax. To read about ``.pipe`` in general terms, | ||
see :ref:`here <basics.pipe>`. | ||
|
||
For a concrete example on combining ``.groupby`` and ``.pipe`` , imagine have a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have -> having
doc/source/groupby.rst
Outdated
|
||
.. ipython:: python | ||
|
||
from numpy.random import choice, random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't import like like, rather import numpy as np
and write things out (e.g. np.random.choice
doc/source/groupby.rst
Outdated
(base_df.pipe(lambda x: x[x.A>3]) | ||
.groupby(['Store', 'Product']) | ||
.pipe(rapport_func) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit abstract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've redone it with the previous df. Alternatively, I could remove this example.
pandas/core/common.py
Outdated
element of the tuple. | ||
|
||
func : callable or tuple of (callable, string) | ||
Function to apply to this GroupBy or, alternatively, a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the GroupBy from here and make this more generic
kwargs[target] = obj | ||
return func(*args, **kwargs) | ||
else: | ||
return func(obj, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a return at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@topper-123 Thanks a lot for reviving this feature. |
a88a50a
to
956483f
Compare
Codecov Report
@@ Coverage Diff @@
## master #17871 +/- ##
==========================================
+ Coverage 91.23% 91.24% +<.01%
==========================================
Files 163 163
Lines 50105 50110 +5
==========================================
+ Hits 45715 45723 +8
+ Misses 4390 4387 -3
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -235,6 +235,9 @@ Other Enhancements | |||
- Improved the import time of pandas by about 2.25x. (:issue:`16764`) | |||
- :func:`read_json` and :func:`to_json` now accept a ``compression`` argument which allows them to transparently handle compressed files. (:issue:`17798`) | |||
- :func:`Series.reindex`, :func:`DataFrame.reindex`, :func:`Index.get_indexer` now support list-like argument for ``tolerance``. (:issue:`17367`) | |||
- ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series`` | |||
that allow for functions that take a ``GroupBy`` to be composed in a clean, readable syntax. | |||
See the :ref:`documentation <groupby.pipe>` for more. (:issue:`17871`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this to highlites is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll move this to highlights and add an example in the whatsnew (if I understand you correctly?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
small change |
e450478
to
4f3e569
Compare
Had some git issue, so squashed all the commits. I'm off to sleep, I'll look into eventual comments tomorrow. |
doc/source/groupby.rst
Outdated
|
||
(df.groupby(['Store', 'Product']).pipe(rapport_func) | ||
|
||
where ``rapport_func`` take an arbitrary GroupBy object and creates a rapport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'report' is the correct English word? (also above in the code example)
(no native speaker, but in my mother tongue 'rapport' exists, but the english translation is 'report' :-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's rapport in my language too. I'm changed that
doc/source/groupby.rst
Outdated
|
||
.. versionadded:: 0.21.0 | ||
|
||
Similar to the functionality provided by ``DataFrames`` and ``Series``, functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrames -> DataFrame (or otherwise do not put it as a code object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments
pandas/core/generic.py
Outdated
@@ -3508,7 +3510,7 @@ def sample(self, n=None, frac=None, replace=False, weights=None, | |||
----- | |||
|
|||
Use ``.pipe`` when chaining together functions that expect | |||
on Series or DataFrames. Instead of writing | |||
Series, DataFrames or GroupBys. Instead of writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is needed? (as the docstring of GroupBy.pipe has a separate one?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "on" was gramatically incorrect, So I changed the line. I think adding "GroupBy objects" add clarity that GroupBy objrcts can be used with piping. I didnt intend for this to only refer to series/DataFrames, but is too easy to misunderstand?
pandas/core/groupby.py
Outdated
Parameters | ||
---------- | ||
func : callable or tuple of (callable, string) | ||
Function to apply to this GroupBy or, alternatively, a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start of "Function ..." does not need to be aligned with the "callable .. " above, but just have a single identation of 4 spaces (numpy docstring specifics ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do "this GroupBy" -> "this GroupBy object"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, changing.
680473c
to
adfb81c
Compare
The comments from @jorisvandenbossche have been addressed. Thanks for reviewing. |
adfb81c
to
efeaf8c
Compare
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -14,6 +14,8 @@ Highlights include: | |||
categoricals independent of the data, see :ref:`here <whatsnew_0210.enhancements.categorical_dtype>`. | |||
- The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames is now consistent and no longer depends on whether `bottleneck <http://berkeleyanalytics.com/bottleneck>`__ is installed, see :ref:`here <whatsnew_0210.api_breaking.bottleneck>` | |||
- Compatibility fixes for pypy, see :ref:`here <whatsnew_0210.pypy>`. | |||
- ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series``, | |||
that allows for functions that take a ``GroupBy`` to be composed in a clean, readable syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a ref to the subsection
pandas/core/groupby.py
Outdated
def pipe(self, func, *args, **kwargs): | ||
""" Apply a function with arguments to this GroupBy object | ||
|
||
.. versionadded:: 0.21.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of mostly repeated doc strings in both places, you can template _pipe and use and Appender/Substitution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to not do this here. The dosctrings are sufficiently different to just make it a lot harder to develop when trying to merge them in a single one with a lot of substituted parts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thats fine
expected = pd.Series([-79.5160891089, -78.4839108911, None], | ||
index=index) | ||
|
||
assert_series_equal(expected, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a tests on SeriesGroupBy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
pandas/core/groupby.py
Outdated
-------- | ||
pandas.Series.pipe | ||
pandas.DataFrame.pipe | ||
pandas.GroupBy.apply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a short note on the difference here? (and also add a See also in apply to pipe, again with a short note on the difference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but admittedly found it very hard to do in so few lines, see result. Do you have suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like: "Apply function to each group instead of to the full GroupBy object" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(IIUC) perhaps
``apply`` applies a function to each group. ``pipe`` applies a function to a ``GroupBy`` object.
would work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've uploaded new ones.
I've built the documentation, and pipe doesn't show up in the API. Do I need to add something somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, In groupby.rst line 1178 I have a link ":ref:here <basics.pipe>
".
This link doesn't show up in the docs, can anyone see why? It seems normal
Hmm, I can see the newest code changes on my github repository, but they don't show up in the PR. Any suggestions on how to proceed? for reference, the newest code her: topper-123@8614c32 EDIT: never mind, it works now. |
efeaf8c
to
8614c32
Compare
8614c32
to
702c1af
Compare
I think you need to add in api.rst as well. if docs build ok, ping on green. |
702c1af
to
608a0e4
Compare
thanks @ghl3 and @topper-123 for the revival |
Great work, @jreback and @topper-123 |
git diff upstream/master -u -- "*.py" | flake8 --diff
This PR adds a
.pipe
method to GroupBy objects like forDataFrame.pipe
andSeries .pipe
.This PR is basically #10466 written by @ghl3 with some very minor updates, because that PR somehow got stalled and subsequently was closed.
This PR says it's new in 0.21, but I'll change that if it's too late to add this.