-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Named aggregations with multiple columns #29268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes please. I would also be interested in this feature. I posted a feature request a few months ago but good to see I am not alone. #28190 If I am not mistaken, it seems it may be easier to implement now with the named aggregates functionality too. |
take |
Is it out now? |
@SpectrumWings are you still working on this? Else I would like to give it a go. |
take |
Hi all. Just wanted to say I would love to see this feature developed. It's a routine very commonly needed in scientific data analysis. dplyr &c support it; would be fantastic to see in pandas. |
Hi there, is there any update on when we can expect this feature? |
@SanderLam pandas is all volunteer features happen when the community does pull requests - you are welcome to do that core can provide review |
I'm very interested in this feature as well |
Looking forward for this one |
Very interested in this, too. I keep getting bummed out that pandas isn't quite as elegant as R when it comes to groupby > aggregate logic, but this would be a great addition! |
Interestingly, Polars organically does that! So if this is super needed, you can import the DF to Polars and do that. I genuinely believe that Pandas should adapt that as well. |
take |
@samukweku it would be something like: |
Hello,
It would be something like:
import polars as pl
df.group_by("col0").agg(
sum_all_under_200 = pl.col('col1').filter(pl.col('col2') > 200).sum()
)
From: Samuel Oranyeli ***@***.***>
Date: Sunday, June 30, 2024 at 5:24 AM
To: pandas-dev/pandas ***@***.***>
Cc: Tawfik ***@***.***>, Mention ***@***.***>
Subject: Re: [pandas-dev/pandas] Named aggregations with multiple columns (#29268)
@tawfikharoun<https://github.com/tawfikharoun> can you share an example of how polars does this? Thanks
—
Reply to this email directly, view it on GitHub<#29268 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/APTIJ3ALCL6P6N4733XDDNDZJ5M6TAVCNFSM4JGKBAY2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJZHAZTSNJQGYYA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Since
pandas 0.25.0
we have named aggregations.Which works fine if you do aggregations on single columns. But what if you want to apply aggregations over multiple columns:
example:
But what if we want to calculate the
a.max() - b.max()
while aggregating. That does not seem to work. For example, something like this would make sense:So is it possible to do named aggregations on multiple columns? If not, is this in the pipeline for future releases?
The text was updated successfully, but these errors were encountered: