API: Add pipe method to GroupBy objects #10353

ghl3 · 2015-06-14T17:32:37Z

Extend the new "pipe" protocol to GroupBy objects to allow for piping of a wider class of functions. Currently, one can only create pipes that chain together objects inheriting from NDFrame. But the concept of piping is general and could be extended to other pandas objects, specifically anything inheriting from GroupBy.

The use case is to write pipe that allow one to freely transform back-and-forth between NDFrames and GroupBy objects. Example:

df = DataFrame({A: [...], B: [...]})

def f(dfgb):
    return dfgb['B'].value_counts()

def g(srs):
    return srs * 2

grouped = df.groupby('A')

grouped.pipe(f).pipe(g)

Note that these transformations are transformations are

GroupBy -> Series
Series -> Series
and the chain seamlessly switches from a GroupBy.pipe to a NDFrame.pipe

There are a few ways to implement this. A simple way is to break out the core functionality of "pipe" into a pure function and then to call that function in any method implementation of pipe. Another way is to think of piping as a mix-in trait, put it as a method in a base class, and then mix that base class into any class that wants to implement pipe-ability. I have no strong preference between these options, and I'm open to other implementations that may be more inline with Pandas' design goals or the long-term vision of the "pipe" concept.

A strawman implementation of the first implementation suggestion can be found here:
master...ghl3:groupby-pipe

CC
@TomAugspurger
@shoyer

The text was updated successfully, but these errors were encountered:

shoyer · 2015-06-14T19:42:16Z

Yep, this looks like a good idea to me.

I don't have strong feelings about how its implemented, though I suspect the mixin approach is the way to go -- that could make it easier to add documentation specific to each class without a lot of duplicated text. (On the other hand, I suspect few people look at the docstrings on groupby methods.) Either way it's pretty straightforward.

jorisvandenbossche · 2015-06-14T21:11:14Z

How is this different than the already existing df.groupby(..).apply(..)? (apart from the ability to pass a tuple in pipe)

ghl3 · 2015-06-14T21:23:35Z

It's similar. df.groupby(..).apply(..) applies the function to the underlying DataFrame in each group, so the function should take a DataFrame. This pipe implementation would act on the groupby itself, so you pass it functions whose argument is a DataFrameGroupBy.

So, with this, you could do:

def f(dfgb):
    return dfgb.get_group('A')

dfgb.pipe(f)

But if you tried with apply, it would fail:

dfgb.apply(f)

jorisvandenbossche · 2015-06-14T22:01:30Z

Ah sorry, misread the fact it would act on the whole GroupBy object instead on the DataFrames/groups.

Do you have an example of a real use case for this? (apart from the dummy example above, just curious)

ghl3 · 2015-06-14T22:20:13Z

Sure. One thing I find myself doing a lot with pandas is working on classification problems (as in machine-learning like problems), and in particular using pandas plotting as a means of exploring and diagnosing classification problems. I've found that a nice way to build helper functions for transforming and plotting data in this domain is to work with a DataFrameGroupBy, where the grouped variable is the class associated with the classification. It's simply a convenient interface to build functions around, as the class is implicit in the grouped column, and it supports multiple classes or nested classes, etc. I have many such functions whose first argument is a DataFrameGroupBy.

So, a common pattern is to start with an initial dataframe, do a lot of transformations on it, and then feed it into a function that takes a DataFrameGroupBy for the purpose of plotting or reporting. This means that the last function call in my chain of transformations is one that takes a dataframe group by. With the new piping functions, it would be nice to do something like:

df = df.pipe(f).pipe(g).pipe(h).groupby('group).pipe(generate_report)

Of course, all this is possible without adding a pipe function, one can always create a temporary variable or do this in a number of other ways, but since we already have the pipe function on the DataFrame, I think adding it to the GroupBy creates a nice symmetry.

jankatins · 2015-06-30T19:54:01Z

If I read the R data rangling cheatsheet right, then in R-land df %>% group_by(...) %>% mutate(...) (or ... %>% summarise(…)is basically df.groupby(...).apply(...) (i.e. apply the function to each group). Not sure if then pipe should map to different semantic (e.g. pipe in the complete gb object).

jreback added API Design Needs Discussion Requires discussion from core team before further action labels Jun 17, 2015

ghl3 mentioned this issue Jun 28, 2015

ENH: Add pipe method to GroupBy (fixes #10353) #10466

Closed

jreback added this to the 0.17.0 milestone Jun 30, 2015

ghl3 added a commit to ghl3/pandas that referenced this issue Sep 13, 2015

ENH: Add pipe method to GroupBy (fixes pandas-dev#10353)

b3686e1

jreback modified the milestones: 0.17.0, 0.17.1 Sep 25, 2015

jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015

TomAugspurger mentioned this issue Oct 13, 2017

ENH: Add .pipe to GroupBy objects #17863

Closed

topper-123 mentioned this issue Oct 14, 2017

ENH: add GroupBy.pipe method #17871

Merged

jreback modified the milestones: Next Major Release, 0.21.0 Oct 14, 2017

jreback closed this as completed in #17871 Oct 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Add pipe method to GroupBy objects #10353

API: Add pipe method to GroupBy objects #10353

ghl3 commented Jun 14, 2015

shoyer commented Jun 14, 2015

jorisvandenbossche commented Jun 14, 2015

ghl3 commented Jun 14, 2015

jorisvandenbossche commented Jun 14, 2015

ghl3 commented Jun 14, 2015

jankatins commented Jun 30, 2015

API: Add pipe method to GroupBy objects #10353

API: Add pipe method to GroupBy objects #10353

Comments

ghl3 commented Jun 14, 2015

shoyer commented Jun 14, 2015

jorisvandenbossche commented Jun 14, 2015

ghl3 commented Jun 14, 2015

jorisvandenbossche commented Jun 14, 2015

ghl3 commented Jun 14, 2015

jankatins commented Jun 30, 2015