Skip to content

Commit ecc1339

Browse files
committed
ENH: add Series & DataFrame .agg/.aggregate to provide convienent
function application that mimics the groupby(..).agg/.aggregate interface .apply is now a synonym for .agg, and will accept dict/list-likes for aggregations CLN: rename .name attr -> ._selection_name from SeriesGroupby for compat (didn't exist on DataFrameGroupBy) resolves conflicts w.r.t. setting .name on a groupby object closes #1623 closes #14464 custom .describe closes #14483 closes #7014
1 parent 1678f14 commit ecc1339

File tree

15 files changed

+884
-57
lines changed

15 files changed

+884
-57
lines changed

doc/source/api.rst

+4
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,8 @@ Function application, GroupBy & Window
306306
:toctree: generated/
307307

308308
Series.apply
309+
Series.aggregate
310+
Series.transform
309311
Series.map
310312
Series.groupby
311313
Series.rolling
@@ -825,6 +827,8 @@ Function application, GroupBy & Window
825827

826828
DataFrame.apply
827829
DataFrame.applymap
830+
DataFrame.aggregate
831+
DataFrame.transform
828832
DataFrame.groupby
829833
DataFrame.rolling
830834
DataFrame.expanding

doc/source/basics.rst

+212-8
Original file line numberDiff line numberDiff line change
@@ -702,7 +702,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
702702

703703
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
704704
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
705-
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
705+
3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`
706+
4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap`
706707

707708
.. _basics.pipe:
708709

@@ -778,6 +779,13 @@ statistics methods, take an optional ``axis`` argument:
778779
df.apply(np.cumsum)
779780
df.apply(np.exp)
780781
782+
``.apply()`` will also dispatch on a string method name.
783+
784+
.. ipython:: python
785+
786+
df.apply('mean')
787+
df.apply('mean', axis=1)
788+
781789
Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
782790
the result will either be of lower dimension or the same dimension.
783791

@@ -827,16 +835,212 @@ set to True, the passed function will instead receive an ndarray object, which
827835
has positive performance implications if you do not need the indexing
828836
functionality.
829837

830-
.. seealso::
838+
.. _basics.aggregate:
839+
840+
Aggregation API
841+
~~~~~~~~~~~~~~~
842+
843+
.. versionadded:: 0.20.0
844+
845+
The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
846+
This API is similar across pandas objects, :ref:`groupby aggregates <groupby.aggregate>`,
847+
:ref:`window functions <stats.aggregate>`, and the :ref:`resample API <timeseries.aggregate>`.
848+
849+
We will use a similar starting frame from above.
850+
851+
.. ipython:: python
852+
853+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
854+
index=pd.date_range('1/1/2000', periods=10))
855+
tsdf.iloc[3:7] = np.nan
856+
tsdf
857+
858+
Using a single function is equivalent to ``.apply``; You can also pass named methods as strings.
859+
This will return a Series of the output.
860+
861+
.. ipython:: python
862+
863+
tsdf.agg(np.sum)
864+
865+
tsdf.agg('sum')
866+
867+
# these are equivalent to a ``.sum()`` because we are aggregating on a single function
868+
tsdf.sum()
869+
870+
On a Series this will result in a scalar value
871+
872+
.. ipython:: python
873+
874+
tsdf.A.agg('sum')
875+
876+
877+
Aggregating multiple functions at once
878+
++++++++++++++++++++++++++++++++++++++
879+
880+
You can pass arguments as a list. The results of each of the passed functions will be a row in the resultant DataFrame.
881+
These are naturally named from the aggregation function.
882+
883+
.. ipython:: python
884+
885+
tsdf.agg(['sum'])
886+
887+
Multiple functions yield multiple rows.
888+
889+
.. ipython:: python
890+
891+
tsdf.agg(['sum', 'mean'])
892+
893+
On a Series, multiple functions return a Series.
894+
895+
.. ipython:: python
896+
897+
tsdf.A.agg(['sum', 'mean'])
898+
899+
900+
Aggregating with a dict of functions
901+
++++++++++++++++++++++++++++++++++++
902+
903+
Passing a dictionary of column name to function or list of functions, to ``DataFame.agg``
904+
allows you to customize which functions are applied to which columns.
905+
906+
.. ipython:: python
907+
908+
tsdf.agg({'A': 'mean', 'B': 'sum'})
909+
910+
Passing a list-like will generate a DataFrame output. You will get a matrix-like output
911+
of all of the aggregators; some may be missing values.
912+
913+
.. ipython:: python
914+
915+
tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})
916+
917+
For a Series, you can pass a dict; the keys will set the name of the column
918+
919+
.. ipython:: python
920+
921+
tsdf.A.agg({'foo' : ['sum', 'mean']})
922+
923+
Alternatively, using multiple dictionaries, you can have renamed elements with the aggregation
924+
925+
.. ipython:: python
926+
927+
tsdf.A.agg({'foo' : 'sum', 'bar':'mean'})
928+
929+
Multiple keys will yield multiple columns.
930+
931+
.. ipython:: python
932+
933+
tsdf.A.agg({'foo' : ['sum', 'mean'], 'bar': ['min', 'max', lambda x: x.sum()+1]})
934+
935+
.. _basics.custom_describe:
936+
937+
Custom describe
938+
+++++++++++++++
939+
940+
With ``.agg()`` is it possible to easily create a custom describe function, similar
941+
to the built in :ref:`describe function <basics.describe>`.
942+
943+
.. ipython:: python
944+
945+
from functools import partial
946+
947+
q_25 = partial(pd.Series.quantile, q=0.25)
948+
q_25.__name__ = '25%'
949+
q_75 = partial(pd.Series.quantile, q=0.75)
950+
q_75.__name__ = '75%'
951+
952+
tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max'])
953+
954+
.. _basics.transform:
955+
956+
Transform API
957+
~~~~~~~~~~~~~
958+
959+
.. versionadded:: 0.20.0
960+
961+
The ``transform`` method returns an object that is indexed the same (same size)
962+
as the original. This API allows you to provide *multiple* operations at the same
963+
time rather than one-by-one. Its api is quite similar to the ``.agg`` API.
964+
965+
Use a similar frame to the above sections.
831966

832-
The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
833-
functionality for grouping by some criterion, applying, and combining the
834-
results into a Series, DataFrame, etc.
967+
.. ipython:: python
968+
969+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
970+
index=pd.date_range('1/1/2000', periods=10))
971+
tsdf.iloc[3:7] = np.nan
972+
tsdf
973+
974+
Transform the entire frame. Transform allows functions to input as a numpy function, string
975+
function name and user defined function.
976+
977+
.. ipython:: python
835978
836-
.. _Elementwise:
979+
tsdf.transform(np.abs)
980+
tsdf.transform('abs')
981+
tsdf.transform(lambda x: x.abs())
982+
983+
Since this is a single function, this is equivalent to a ufunc application
984+
985+
.. ipython:: python
986+
987+
np.abs(tsdf)
988+
989+
Passing a single function to ``.transform()`` with a Series will yield a single Series in return.
990+
991+
.. ipython:: python
837992
838-
Applying elementwise Python functions
839-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
993+
tsdf.A.transform(np.abs)
994+
995+
996+
Transform with multiple functions
997+
+++++++++++++++++++++++++++++++++
998+
999+
Passing multiple functions will yield a column multi-indexed DataFrame.
1000+
The first level will be the original frame column names; the second level
1001+
will be the names of the transforming functions.
1002+
1003+
.. ipython:: python
1004+
1005+
tsdf.transform([np.abs, lambda x: x+1])
1006+
1007+
Passing multiple functions to a Series will yield a DataFrame. The
1008+
resulting column names will be the transforming functions.
1009+
1010+
.. ipython:: python
1011+
1012+
tsdf.A.transform([np.abs, lambda x: x+1])
1013+
1014+
1015+
Transforming with a dict of functions
1016+
+++++++++++++++++++++++++++++++++++++
1017+
1018+
1019+
Passing a dict of functions will will allow selective transforming per column.
1020+
1021+
.. ipython:: python
1022+
1023+
tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
1024+
1025+
Passing a dict of lists will generate a multi-indexed DataFrame with these
1026+
selective transforms.
1027+
1028+
.. ipython:: python
1029+
1030+
tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})
1031+
1032+
On a Series, passing a dict allows renaming as in ``.agg()``
1033+
1034+
.. ipython:: python
1035+
1036+
tsdf.A.transform({'foo': np.abs})
1037+
tsdf.A.transform({'foo': np.abs, 'bar': [lambda x: x+1, 'sqrt']})
1038+
1039+
1040+
.. _basics.elementwise:
1041+
1042+
Applying Elementwise Functions
1043+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8401044

8411045
Since not all functions can be vectorized (accept NumPy arrays and return
8421046
another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame

doc/source/computation.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -565,7 +565,9 @@ Aggregation
565565
-----------
566566

567567
Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to
568-
perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here <groupby.aggregate>`.
568+
perform multiple computations on the data. These operations are similar to the :ref:`aggregating API <basics.aggregate>`,
569+
:ref:`groupby aggregates <groupby.aggregate>`, and :ref:`resample API <timeseries.aggregate>`.
570+
569571

570572
.. ipython:: python
571573

doc/source/groupby.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,9 @@ Aggregation
439439
-----------
440440

441441
Once the GroupBy object has been created, several methods are available to
442-
perform a computation on the grouped data.
442+
perform a computation on the grouped data. These operations are similar to the
443+
:ref:`aggregating API <basics.aggregate>`, :ref:`window functions <stats.aggregate>`,
444+
and :ref:`resample API <timeseries.aggregate>`.
443445

444446
An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method:
445447

doc/source/timeseries.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -1470,11 +1470,13 @@ We can instead only resample those groups where we have points as follows:
14701470
14711471
ts.groupby(partial(round, freq='3T')).sum()
14721472
1473+
.. _timeseries.aggregate:
1474+
14731475
Aggregation
14741476
~~~~~~~~~~~
14751477

1476-
Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
1477-
resampled.
1478+
Similar to the :ref:`aggregating API <basics.aggregate>`, :ref:`groupby aggregates <groupby.aggregate>`, and :ref:`window functions <stats.aggregate>`,
1479+
a ``Resampler`` can be selectively resampled.
14781480

14791481
Resampling a ``DataFrame``, the default will be to act on all columns with the same function.
14801482

doc/source/whatsnew/v0.20.0.txt

+55
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ users upgrade to this version.
99

1010
Highlights include:
1111

12+
- new ``.agg()`` API for Series/DataFrame similar to the groupby-rolling-resample API's, see :ref:`here <whatsnew_0200.enhancements.agg>`
1213
- Building pandas for development now requires ``cython >= 0.23`` (:issue:`14831`)
1314

1415
Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations <whatsnew_0200.deprecations>` before updating.
@@ -22,6 +23,60 @@ Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations
2223
New features
2324
~~~~~~~~~~~~
2425

26+
.. _whatsnew_0200.enhancements.agg:
27+
28+
``agg`` API
29+
^^^^^^^^^^^
30+
31+
Series & DataFrame have been enhanced to support the aggregation API. This is an already familiar API that
32+
is supported for groupby, windows operations, and resampling. This allows one to express, possibly multiple
33+
aggregation operations in a single concise way by using ``.agg()`` and ``.transform()``. The
34+
full documentation is :ref:`here <basics.aggregate>`` (:issue:`1623`)
35+
36+
Here is a sample
37+
38+
.. ipython:: python
39+
40+
df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
41+
index=pd.date_range('1/1/2000', periods=10))
42+
df.iloc[3:7] = np.nan
43+
df
44+
45+
One can operate using string function names, callables, lists, or dictionaries of these.
46+
47+
Using a single function is equivalent to ``.apply``.
48+
49+
.. ipython:: python
50+
51+
df.agg('sum')
52+
53+
Multiple functions in lists.
54+
55+
.. ipython:: python
56+
57+
df.agg(['sum', 'min'])
58+
59+
Dictionaries to provide the ability to selective calculation.
60+
61+
.. ipython:: python
62+
63+
df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
64+
65+
When operating on a Series, passing a dictionry allows one to rename multiple
66+
function aggregates; this will return a DataFrame.
67+
68+
.. ipython:: python
69+
70+
df.A.agg({'foo':['sum', 'min'], 'bar' : ['count','max']})
71+
72+
The API also supports a ``.transform()`` function to provide for broadcasting results.
73+
74+
.. ipython:: python
75+
76+
df.transform(['abs', lambda x: x-x.min()])
77+
78+
79+
.. _whatsnew_0200.enhancements.dtype:
2580

2681
``dtype`` keyword for data IO
2782
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)