ENH: add GroupBy.pipe method #17871

topper-123 · 2017-10-14T19:58:27Z

[x ] closes API: Add pipe method to GroupBy objects #10353, see also ENH: Add .pipe to GroupBy objects #17863
[x ] tests added / passed
[ x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
[ x] whatsnew entry

This PR adds a .pipe method to GroupBy objects like for DataFrame.pipe and Series .pipe.

This PR is basically #10466 written by @ghl3 with some very minor updates, because that PR somehow got stalled and subsequently was closed.

This PR says it's new in 0.21, but I'll change that if it's too late to add this.

jreback · 2017-10-14T20:09:56Z

doc/source/groupby.rst

+allow for a cleaner, more readable syntax. To read about ``.pipe`` in general terms,
+see :ref:`here <basics.pipe>`.
+
+For a concrete example on combining ``.groupby`` and ``.pipe`` , imagine have a


have -> having

jreback · 2017-10-14T20:10:35Z

doc/source/groupby.rst

+
+.. ipython:: python
+
+    from numpy.random import choice, random


don't import like like, rather import numpy as np and write things out (e.g. np.random.choice

jreback · 2017-10-14T20:11:34Z

doc/source/groupby.rst

+   (base_df.pipe(lambda x: x[x.A>3])
+           .groupby(['Store', 'Product'])
+           .pipe(rapport_func)
+


this is a bit abstract

Ok, I've redone it with the previous df. Alternatively, I could remove this example.

jreback · 2017-10-14T20:13:42Z

pandas/core/common.py

+    element of the tuple.
+
+    func : callable or tuple of (callable, string)
+           Function to apply to this GroupBy or, alternatively, a


you can remove the GroupBy from here and make this more generic

jreback · 2017-10-14T20:14:23Z

pandas/core/common.py

+        kwargs[target] = obj
+        return func(*args, **kwargs)
+    else:
+        return func(obj, *args, **kwargs)


need a return at the end

ghl3 · 2017-10-14T21:31:47Z

@topper-123 Thanks a lot for reviving this feature.

codecov · 2017-10-14T23:04:42Z

Codecov Report

Merging #17871 into master will increase coverage by <.01%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master   #17871      +/-   ##
==========================================
+ Coverage   91.23%   91.24%   +<.01%     
==========================================
  Files         163      163              
  Lines       50105    50110       +5     
==========================================
+ Hits        45715    45723       +8     
+ Misses       4390     4387       -3

Flag	Coverage Δ
#multiple	`89.05% <50%> (+0.02%)`	⬆️
#single	`40.31% <16.66%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`92.53% <100%> (+0.33%)`	⬆️
pandas/core/groupby.py	`91.99% <100%> (ø)`	⬆️
pandas/core/common.py	`91.18% <33.33%> (-1.63%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/core/indexes/datetimes.py	`95.41% <0%> (-0.1%)`	⬇️
pandas/core/dtypes/dtypes.py	`95.14% <0%> (ø)`	⬆️
pandas/plotting/_converter.py	`65.2% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bf7f9a...608a0e4. Read the comment docs.

topper-123 · 2017-10-14T23:07:23Z

@jreback , I've made changes to the PR. Travis was green and newest forcepush is just for some linting issues.

@ghl3, well, thank you for writing most of this:-)

jreback · 2017-10-14T23:17:56Z

doc/source/whatsnew/v0.21.0.txt

@@ -235,6 +235,9 @@ Other Enhancements
 - Improved the import time of pandas by about 2.25x.  (:issue:`16764`)
 - :func:`read_json` and :func:`to_json` now accept a ``compression`` argument which allows them to transparently handle compressed files. (:issue:`17798`)
 - :func:`Series.reindex`, :func:`DataFrame.reindex`, :func:`Index.get_indexer` now support list-like argument for ``tolerance``. (:issue:`17367`)
+- ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series``
+  that allow for functions that take a ``GroupBy`` to be composed in a clean, readable syntax.
+  See the :ref:`documentation <groupby.pipe>` for more. (:issue:`17871`)


you can move this to highlites is ok

Ok, I'll move this to highlights and add an example in the whatsnew (if I understand you correctly?)

jreback · 2017-10-14T23:19:41Z

small change

@TomAugspurger @jorisvandenbossche @shoyer

topper-123 · 2017-10-15T00:11:23Z

Had some git issue, so squashed all the commits.

I'm off to sleep, I'll look into eventual comments tomorrow.

jorisvandenbossche · 2017-10-16T22:01:47Z

doc/source/groupby.rst

+
+   (df.groupby(['Store', 'Product']).pipe(rapport_func)
+
+where ``rapport_func`` take an arbitrary GroupBy object and creates a rapport


I think 'report' is the correct English word? (also above in the code example)
(no native speaker, but in my mother tongue 'rapport' exists, but the english translation is 'report' :-))

Yeah, it's rapport in my language too. I'm changed that

jorisvandenbossche · 2017-10-16T22:01:50Z

doc/source/groupby.rst

+
+.. versionadded:: 0.21.0
+
+Similar to the functionality provided by ``DataFrames`` and ``Series``, functions


DataFrames -> DataFrame (or otherwise do not put it as a code object)

jorisvandenbossche

Added some minor comments

jorisvandenbossche · 2017-10-16T22:04:20Z

pandas/core/generic.py

@@ -3508,7 +3510,7 @@ def sample(self, n=None, frac=None, replace=False, weights=None,
        -----

        Use ``.pipe`` when chaining together functions that expect
-        on Series or DataFrames. Instead of writing
+        Series, DataFrames or GroupBys. Instead of writing


I don't think this change is needed? (as the docstring of GroupBy.pipe has a separate one?)

The "on" was gramatically incorrect, So I changed the line. I think adding "GroupBy objects" add clarity that GroupBy objrcts can be used with piping. I didnt intend for this to only refer to series/DataFrames, but is too easy to misunderstand?

jorisvandenbossche · 2017-10-16T22:05:28Z

pandas/core/groupby.py

+        Parameters
+        ----------
+        func : callable or tuple of (callable, string)
+               Function to apply to this GroupBy or, alternatively, a


The start of "Function ..." does not need to be aligned with the "callable .. " above, but just have a single identation of 4 spaces (numpy docstring specifics ..)

I would do "this GroupBy" -> "this GroupBy object"

ok, changing.

topper-123 · 2017-10-17T01:13:18Z

The comments from @jorisvandenbossche have been addressed. Thanks for reviewing.

jreback · 2017-10-17T10:15:58Z

doc/source/whatsnew/v0.21.0.txt

@@ -14,6 +14,8 @@ Highlights include:
  categoricals independent of the data, see :ref:`here <whatsnew_0210.enhancements.categorical_dtype>`.
 - The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames is now consistent and no longer depends on whether `bottleneck <http://berkeleyanalytics.com/bottleneck>`__ is installed, see :ref:`here <whatsnew_0210.api_breaking.bottleneck>`
 - Compatibility fixes for pypy, see :ref:`here <whatsnew_0210.pypy>`.
+- ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series``,
+  that allows for functions that take a ``GroupBy`` to be composed in a clean, readable syntax.


add a ref to the subsection

jreback · 2017-10-17T10:18:09Z

pandas/core/groupby.py

+    def pipe(self, func, *args, **kwargs):
+        """ Apply a function with arguments to this GroupBy object
+
+        .. versionadded:: 0.21.0


instead of mostly repeated doc strings in both places, you can template _pipe and use and Appender/Substitution

I would propose to not do this here. The dosctrings are sufficiently different to just make it a lot harder to develop when trying to merge them in a single one with a lot of substituted parts

ok thats fine

jreback · 2017-10-17T10:19:11Z

pandas/tests/groupby/test_groupby.py

+        expected = pd.Series([-79.5160891089, -78.4839108911, None],
+                             index=index)
+
+        assert_series_equal(expected, result)


can you add a tests on SeriesGroupBy

jorisvandenbossche · 2017-10-17T11:26:22Z

pandas/core/groupby.py

+        --------
+        pandas.Series.pipe
+        pandas.DataFrame.pipe
+        pandas.GroupBy.apply


Can you add a short note on the difference here? (and also add a See also in apply to pipe, again with a short note on the difference)

I tried, but admittedly found it very hard to do in so few lines, see result. Do you have suggestions?

Something like: "Apply function to each group instead of to the full GroupBy object" ?

(IIUC) perhaps

``apply`` applies a function to each group. ``pipe`` applies a function to a ``GroupBy`` object.

would work

Ok, I've uploaded new ones.

I've built the documentation, and pipe doesn't show up in the API. Do I need to add something somewhere?

Also, In groupby.rst line 1178 I have a link ":ref:here <basics.pipe>".

This link doesn't show up in the docs, can anyone see why? It seems normal

topper-123 · 2017-10-17T15:43:41Z

Hmm, I can see the newest code changes on my github repository, but they don't show up in the PR. Any suggestions on how to proceed?

for reference, the newest code her: topper-123@8614c32

EDIT: never mind, it works now.

jreback · 2017-10-17T22:38:54Z

I think you need to add in api.rst as well. if docs build ok, ping on green.

jreback · 2017-10-18T10:30:36Z

thanks @ghl3 and @topper-123 for the revival

ghl3 · 2017-10-18T12:25:57Z

Great work, @jreback and @topper-123

topper-123 force-pushed the groupby.pipe branch from 98949b5 to 9173deb Compare October 14, 2017 19:59

topper-123 changed the title ~~add GroupBy.pipe method~~ ENH: add GroupBy.pipe method Oct 14, 2017

jreback requested changes Oct 14, 2017

View reviewed changes

jreback added Enhancement Groupby labels Oct 14, 2017

topper-123 force-pushed the groupby.pipe branch 3 times, most recently from a88a50a to 956483f Compare October 14, 2017 23:04

jreback reviewed Oct 14, 2017

View reviewed changes

jreback added this to the 0.21.0 milestone Oct 14, 2017

jreback approved these changes Oct 14, 2017

View reviewed changes

topper-123 force-pushed the groupby.pipe branch from e450478 to 4f3e569 Compare October 15, 2017 00:07

jorisvandenbossche reviewed Oct 16, 2017

View reviewed changes

topper-123 force-pushed the groupby.pipe branch 2 times, most recently from 680473c to adfb81c Compare October 17, 2017 01:10

topper-123 force-pushed the groupby.pipe branch from adfb81c to efeaf8c Compare October 17, 2017 01:16

jreback requested changes Oct 17, 2017

View reviewed changes

jorisvandenbossche reviewed Oct 17, 2017

View reviewed changes

topper-123 force-pushed the groupby.pipe branch from efeaf8c to 8614c32 Compare October 17, 2017 15:52

topper-123 mentioned this pull request Oct 17, 2017

ENH: .pipe on Resampler #17905

Closed

topper-123 force-pushed the groupby.pipe branch from 8614c32 to 702c1af Compare October 17, 2017 21:40

Add GroupBy.pipe method

608a0e4

topper-123 force-pushed the groupby.pipe branch from 702c1af to 608a0e4 Compare October 17, 2017 23:38

jreback approved these changes Oct 18, 2017

View reviewed changes

jreback merged commit 5687f9e into pandas-dev:master Oct 18, 2017

yeemey pushed a commit to yeemey/pandas that referenced this pull request Oct 20, 2017

Add GroupBy.pipe method (pandas-dev#17871)

79993a9

topper-123 deleted the groupby.pipe branch November 6, 2017 21:46

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

Add GroupBy.pipe method (pandas-dev#17871)

f927c50

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Add GroupBy.pipe method (pandas-dev#17871)

cc404d3

topper-123 mentioned this pull request Dec 26, 2017

ENH: Let Resampler objects have a pipe method #18940

Merged

4 tasks


		(df.groupby(['Store', 'Product']).pipe(rapport_func)

		where ``rapport_func`` take an arbitrary GroupBy object and creates a rapport


		.. versionadded:: 0.21.0

		Similar to the functionality provided by ``DataFrames`` and ``Series``, functions

ENH: add GroupBy.pipe method #17871

ENH: add GroupBy.pipe method #17871

Conversation

topper-123 commented Oct 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghl3 commented Oct 14, 2017

codecov bot commented Oct 14, 2017 • edited Loading

Codecov Report

topper-123 commented Oct 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 14, 2017

topper-123 commented Oct 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 Oct 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 commented Oct 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 commented Oct 17, 2017 • edited Loading

jreback commented Oct 17, 2017

jreback commented Oct 18, 2017

ghl3 commented Oct 18, 2017

topper-123 commented Oct 14, 2017 •

edited

Loading

codecov bot commented Oct 14, 2017 •

edited

Loading

topper-123 commented Oct 15, 2017 •

edited

Loading

topper-123 Oct 17, 2017 •

edited

Loading

topper-123 commented Oct 17, 2017 •

edited

Loading