Open
Description
Since pandas 0.25.0
we have named aggregations.
Which works fine if you do aggregations on single columns. But what if you want to apply aggregations over multiple columns:
example:
# example dataframe
df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
df['group'] = [0, 0, 1, 1]
a b c d group
0 0.751462 0.572576 0.192957 0.921723 0
1 0.070777 0.801548 0.601678 0.344633 0
2 0.112964 0.361984 0.416241 0.785764 1
3 0.380045 0.486494 0.000594 0.608759 1
# aggregations on single columns
df.groupby('group').agg(
a_sum=('a', 'sum'),
a_mean=('a', 'mean'),
b_mean=('b', 'mean'),
c_sum=('c', 'sum'),
d_range=('d', lambda x: x.max() - x.min())
)
a_sum a_mean b_mean c_sum d_range
group
0 0.947337 0.473668 0.871939 0.838150 0.320543
1 0.604149 0.302074 0.656902 0.542985 0.057681
But what if we want to calculate the a.max() - b.max()
while aggregating. That does not seem to work. For example, something like this would make sense:
df.groupby('group').agg(
diff_a_b=(['a', 'b'], lambda x: x['a'].max() - x['b'].max())
)
So is it possible to do named aggregations on multiple columns? If not, is this in the pipeline for future releases?