Skip to content

Named aggregations with multiple columns #29268

Open
@erfannariman

Description

@erfannariman

Since pandas 0.25.0 we have named aggregations.

Which works fine if you do aggregations on single columns. But what if you want to apply aggregations over multiple columns:

example:

# example dataframe
df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
df['group'] = [0, 0, 1, 1]

          a         b         c         d  group
0  0.751462  0.572576  0.192957  0.921723      0
1  0.070777  0.801548  0.601678  0.344633      0
2  0.112964  0.361984  0.416241  0.785764      1
3  0.380045  0.486494  0.000594  0.608759      1

# aggregations on single columns
df.groupby('group').agg(
             a_sum=('a', 'sum'),
             a_mean=('a', 'mean'),
             b_mean=('b', 'mean'),
             c_sum=('c', 'sum'),
             d_range=('d', lambda x: x.max() - x.min())
)

          a_sum    a_mean    b_mean     c_sum   d_range
group                                                  
0      0.947337  0.473668  0.871939  0.838150  0.320543
1      0.604149  0.302074  0.656902  0.542985  0.057681

But what if we want to calculate the a.max() - b.max() while aggregating. That does not seem to work. For example, something like this would make sense:

df.groupby('group').agg(
    diff_a_b=(['a', 'b'], lambda x: x['a'].max() - x['b'].max())
)

So is it possible to do named aggregations on multiple columns? If not, is this in the pipeline for future releases?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions