Skip to content

Named aggregations with multiple columns #29268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
erfannariman opened this issue Oct 29, 2019 · 15 comments
Open

Named aggregations with multiple columns #29268

erfannariman opened this issue Oct 29, 2019 · 15 comments
Assignees
Labels
Apply Apply, Aggregate, Transform, Map Enhancement Groupby

Comments

@erfannariman
Copy link
Member

erfannariman commented Oct 29, 2019

Since pandas 0.25.0 we have named aggregations.

Which works fine if you do aggregations on single columns. But what if you want to apply aggregations over multiple columns:

example:

# example dataframe
df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
df['group'] = [0, 0, 1, 1]

          a         b         c         d  group
0  0.751462  0.572576  0.192957  0.921723      0
1  0.070777  0.801548  0.601678  0.344633      0
2  0.112964  0.361984  0.416241  0.785764      1
3  0.380045  0.486494  0.000594  0.608759      1

# aggregations on single columns
df.groupby('group').agg(
             a_sum=('a', 'sum'),
             a_mean=('a', 'mean'),
             b_mean=('b', 'mean'),
             c_sum=('c', 'sum'),
             d_range=('d', lambda x: x.max() - x.min())
)

          a_sum    a_mean    b_mean     c_sum   d_range
group                                                  
0      0.947337  0.473668  0.871939  0.838150  0.320543
1      0.604149  0.302074  0.656902  0.542985  0.057681

But what if we want to calculate the a.max() - b.max() while aggregating. That does not seem to work. For example, something like this would make sense:

df.groupby('group').agg(
    diff_a_b=(['a', 'b'], lambda x: x['a'].max() - x['b'].max())
)

So is it possible to do named aggregations on multiple columns? If not, is this in the pipeline for future releases?

@jbrockmendel jbrockmendel added the Apply Apply, Aggregate, Transform, Map label Oct 30, 2019
@delica1
Copy link

delica1 commented Feb 6, 2020

Yes please. I would also be interested in this feature. I posted a feature request a few months ago but good to see I am not alone. #28190

If I am not mistaken, it seems it may be easier to implement now with the named aggregates functionality too.

@SpectrumWings
Copy link

take

@theSuiGenerisAakash
Copy link

Is it out now?

@erfannariman
Copy link
Member Author

@SpectrumWings are you still working on this? Else I would like to give it a go.

@erfannariman
Copy link
Member Author

take

@JasonAHendry
Copy link

Hi all. Just wanted to say I would love to see this feature developed. It's a routine very commonly needed in scientific data analysis. dplyr &c support it; would be fantastic to see in pandas.

@SanderLam
Copy link

Hi there, is there any update on when we can expect this feature?

@jreback
Copy link
Contributor

jreback commented Mar 25, 2022

@SanderLam pandas is all volunteer

features happen when the community does pull requests - you are welcome to do that

core can provide review

@Mondonauta
Copy link

I'm very interested in this feature as well

@nick-konovalchuk
Copy link

Looking forward for this one

@alink-volpe
Copy link

Very interested in this, too. I keep getting bummed out that pandas isn't quite as elegant as R when it comes to groupby > aggregate logic, but this would be a great addition!

@tawfikharoun
Copy link

Interestingly, Polars organically does that! So if this is super needed, you can import the DF to Polars and do that. I genuinely believe that Pandas should adapt that as well.

@samukweku
Copy link
Contributor

take

@tawfikharoun
Copy link

@samukweku it would be something like:
import polars as pl
df.group_by("col0").agg(
sum_all_under_200 = pl.col('col1').filter(pl.col('col2') > 200).sum()
)

@tawfikharoun
Copy link

tawfikharoun commented Jun 30, 2024 via email

@samukweku samukweku removed their assignment Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Enhancement Groupby
Projects
None yet