-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: Add pipe method to GroupBy objects #10353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yep, this looks like a good idea to me. I don't have strong feelings about how its implemented, though I suspect the mixin approach is the way to go -- that could make it easier to add documentation specific to each class without a lot of duplicated text. (On the other hand, I suspect few people look at the docstrings on groupby methods.) Either way it's pretty straightforward. |
How is this different than the already existing |
It's similar. df.groupby(..).apply(..) applies the function to the underlying DataFrame in each group, so the function should take a DataFrame. This pipe implementation would act on the groupby itself, so you pass it functions whose argument is a DataFrameGroupBy. So, with this, you could do:
But if you tried with apply, it would fail:
|
Ah sorry, misread the fact it would act on the whole GroupBy object instead on the DataFrames/groups. Do you have an example of a real use case for this? (apart from the dummy example above, just curious) |
Sure. One thing I find myself doing a lot with pandas is working on classification problems (as in machine-learning like problems), and in particular using pandas plotting as a means of exploring and diagnosing classification problems. I've found that a nice way to build helper functions for transforming and plotting data in this domain is to work with a DataFrameGroupBy, where the grouped variable is the class associated with the classification. It's simply a convenient interface to build functions around, as the class is implicit in the grouped column, and it supports multiple classes or nested classes, etc. I have many such functions whose first argument is a DataFrameGroupBy. So, a common pattern is to start with an initial dataframe, do a lot of transformations on it, and then feed it into a function that takes a DataFrameGroupBy for the purpose of plotting or reporting. This means that the last function call in my chain of transformations is one that takes a dataframe group by. With the new piping functions, it would be nice to do something like:
Of course, all this is possible without adding a pipe function, one can always create a temporary variable or do this in a number of other ways, but since we already have the pipe function on the DataFrame, I think adding it to the GroupBy creates a nice symmetry. |
If I read the R data rangling cheatsheet right, then in R-land |
Extend the new "pipe" protocol to GroupBy objects to allow for piping of a wider class of functions. Currently, one can only create pipes that chain together objects inheriting from NDFrame. But the concept of piping is general and could be extended to other pandas objects, specifically anything inheriting from GroupBy.
The use case is to write pipe that allow one to freely transform back-and-forth between NDFrames and GroupBy objects. Example:
Note that these transformations are transformations are
and the chain seamlessly switches from a GroupBy.pipe to a NDFrame.pipe
There are a few ways to implement this. A simple way is to break out the core functionality of "pipe" into a pure function and then to call that function in any method implementation of pipe. Another way is to think of piping as a mix-in trait, put it as a method in a base class, and then mix that base class into any class that wants to implement pipe-ability. I have no strong preference between these options, and I'm open to other implementations that may be more inline with Pandas' design goals or the long-term vision of the "pipe" concept.
A strawman implementation of the first implementation suggestion can be found here:
master...ghl3:groupby-pipe
CC
@TomAugspurger
@shoyer
The text was updated successfully, but these errors were encountered: