@@ -568,6 +568,67 @@ For a grouped ``DataFrame``, you can rename in a similar manner:
568
568
' mean' : ' bar' ,
569
569
' std' : ' baz' }))
570
570
571
+ .. _groupby.aggregate.named :
572
+
573
+ Named Aggregation
574
+ ~~~~~~~~~~~~~~~~~
575
+
576
+ .. versionadded :: 0.25.0
577
+
578
+ To support column-specific aggregation *with control over the output column names *, pandas
579
+ accepts the special syntax in :meth: `GroupBy.agg `, known as "named aggregation", where
580
+
581
+ - The keywords are the *output * column names
582
+ - The values are tuples whose first element is the column to select
583
+ and the second element is the aggregation to apply to that column. Pandas
584
+ provides the ``pandas.NamedAgg `` namedtuple with the fields ``['column', 'aggfunc'] ``
585
+ to make it clearer what the arguments are. As usual, the aggregation can
586
+ be a callable or a string alias.
587
+
588
+ .. ipython :: python
589
+
590
+ animals = pd.DataFrame({' kind' : [' cat' , ' dog' , ' cat' , ' dog' ],
591
+ ' height' : [9.1 , 6.0 , 9.5 , 34.0 ],
592
+ ' weight' : [7.9 , 7.5 , 9.9 , 198.0 ]})
593
+ animals
594
+
595
+ animals.groupby(" kind" ).agg(
596
+ min_height = pd.NamedAgg(column = ' height' , aggfunc = ' min' ),
597
+ max_height = pd.NamedAgg(column = ' height' , aggfunc = ' max' ),
598
+ average_weight = pd.NamedAgg(column = ' height' , aggfunc = np.mean),
599
+ )
600
+
601
+
602
+ ``pandas.NamedAgg `` is just a ``namedtuple ``. Plain tuples are allowed as well.
603
+
604
+ .. ipython :: python
605
+
606
+ animals.groupby(" kind" ).agg(
607
+ min_height = (' height' , ' min' ),
608
+ max_height = (' height' , ' max' ),
609
+ average_weight = (' height' , np.mean),
610
+ )
611
+
612
+
613
+ If your desired output column names are not valid python keywords, construct a dictionary
614
+ and unpack the keyword arguments
615
+
616
+ .. ipython :: python
617
+
618
+ animals.groupby(" kind" ).agg(** {
619
+ ' total weight' : pd.NamedAgg(column = ' weight' , aggfunc = sum ),
620
+ })
621
+
622
+ Additional keyword arguments are not passed through to the aggregation functions. Only pairs
623
+ of ``(column, aggfunc) `` should be passed as ``**kwargs ``. If your aggregation functions
624
+ requires additional arguments, partially apply them with :meth: `functools.partial `.
625
+
626
+ .. note ::
627
+
628
+ For Python 3.5 and earlier, the order of ``**kwargs `` in a functions was not
629
+ preserved. This means that the output column ordering would not be
630
+ consistent. To ensure consistent ordering, the keys (and so output columns)
631
+ will always be sorted for Python 3.5.
571
632
572
633
Applying different functions to DataFrame columns
573
634
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -588,19 +649,6 @@ must be either implemented on GroupBy or available via :ref:`dispatching
588
649
589
650
grouped.agg({' C' : ' sum' , ' D' : ' std' })
590
651
591
- .. note ::
592
-
593
- If you pass a dict to ``aggregate ``, the ordering of the output columns is
594
- non-deterministic. If you want to be sure the output columns will be in a specific
595
- order, you can use an ``OrderedDict ``. Compare the output of the following two commands:
596
-
597
- .. ipython :: python
598
-
599
- from collections import OrderedDict
600
-
601
- grouped.agg({' D' : ' std' , ' C' : ' mean' })
602
- grouped.agg(OrderedDict([(' D' , ' std' ), (' C' , ' mean' )]))
603
-
604
652
.. _groupby.aggregate.cython :
605
653
606
654
Cython-optimized aggregation functions
0 commit comments