Skip to content

Commit b17034b

Browse files
committed
ENH: add Series & DataFrame .agg/.aggregate to provide convienent
function application that mimics the groupby(..).agg/.aggregate interface .apply is now a synonym for .agg, and will accept dict/list-likes for aggregations CLN: rename .name attr -> ._selection_name from SeriesGroupby for compat (didn't exist on DataFrameGroupBy) resolves conflicts w.r.t. setting .name on a groupby object closes #1623 closes #14464 custom .describe closes #14483 closes #7014
1 parent 2466ecb commit b17034b

File tree

14 files changed

+809
-57
lines changed

14 files changed

+809
-57
lines changed

doc/source/api.rst

+4
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,8 @@ Function application, GroupBy & Window
305305
:toctree: generated/
306306

307307
Series.apply
308+
Series.aggregate
309+
Series.transform
308310
Series.map
309311
Series.groupby
310312
Series.rolling
@@ -824,6 +826,8 @@ Function application, GroupBy & Window
824826

825827
DataFrame.apply
826828
DataFrame.applymap
829+
DataFrame.aggregate
830+
DataFrame.transform
827831
DataFrame.groupby
828832
DataFrame.rolling
829833
DataFrame.expanding

doc/source/basics.rst

+204-8
Original file line numberDiff line numberDiff line change
@@ -700,7 +700,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
700700

701701
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
702702
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
703-
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
703+
3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`
704+
4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap`
704705

705706
.. _basics.pipe:
706707

@@ -776,6 +777,13 @@ statistics methods, take an optional ``axis`` argument:
776777
df.apply(np.cumsum)
777778
df.apply(np.exp)
778779
780+
``.apply()`` will also dispatch on a string method name.
781+
782+
.. ipython:: python
783+
784+
df.apply('mean')
785+
df.apply('mean', axis=1)
786+
779787
Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
780788
the result will either be of lower dimension or the same dimension.
781789

@@ -825,16 +833,204 @@ set to True, the passed function will instead receive an ndarray object, which
825833
has positive performance implications if you do not need the indexing
826834
functionality.
827835

828-
.. seealso::
836+
.. _basics.aggregate:
837+
838+
Aggregation API
839+
~~~~~~~~~~~~~~~
840+
841+
.. versionadded:: 0.20.0
842+
843+
The aggregation APi allows one to express possibly multiple aggregation operations in a single concise way.
844+
This API is similar across pandas objects, :ref:`groupby aggregates <groupby.aggregate>`,
845+
:ref:`window functions <stats.aggregate>`, and the :ref:`resample API <timeseries.aggregate>`.
846+
847+
We will use a similar starting frame from above.
848+
849+
.. ipython:: python
850+
851+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
852+
index=pd.date_range('1/1/2000', periods=10))
853+
tsdf.iloc[3:7] = np.nan
854+
tsdf
855+
856+
Using a single function is equivalent to ``.apply``; You can also pass named methods as strings.
857+
This will return a Series of the output.
858+
859+
.. ipython:: python
860+
861+
tsdf.agg(np.sum)
862+
863+
tsdf.agg('sum')
864+
865+
On a Series this will result in a scalar value
866+
867+
.. ipython:: python
868+
869+
tsdf.A.agg('sum')
870+
871+
872+
Aggregating multiple functions at once
873+
++++++++++++++++++++++++++++++++++++++
874+
875+
You can pass arguments as a list. The results of each of the passed functions will be a row in the resultant DataFrame.
876+
These are naturally named from the aggregation function.
877+
878+
.. ipython:: python
879+
880+
tsdf.agg(['sum'])
881+
882+
Multiple functions yield multiple rows.
883+
884+
.. ipython:: python
885+
886+
tsdf.agg(['sum', 'mean'])
887+
888+
On a Series, multiple functions return a Series.
889+
890+
.. ipython:: python
891+
892+
tsdf.A.agg(['sum', 'mean'])
893+
894+
895+
Aggregating with a dict of functions
896+
++++++++++++++++++++++++++++++++++++
897+
898+
Passing a dictionary of column name to function or list of functions, to ``DataFame.agg``
899+
allows you to customize which functions are applied to which columns.
900+
901+
.. ipython:: python
902+
903+
tsdf.agg({'A': 'mean', 'B': 'sum'})
904+
905+
Passing a list-like will generate a DataFrame output. You will get a matrix-like output
906+
of all of the aggregators; some may be missing values.
907+
908+
.. ipython:: python
909+
910+
tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})
911+
912+
For a Series, you can pass a dict; the keys will set the name of the column
913+
914+
.. ipython:: python
915+
916+
tsdf.A.agg({'foo' : ['sum', 'mean']})
917+
918+
Multiple keys will yield multiple columns.
919+
920+
.. ipython:: python
921+
922+
tsdf.A.agg({'foo' : ['sum', 'mean'], 'bar': ['min', 'max', lambda x: x.sum()+1]})
923+
924+
.. _basics.custom_describe:
829925

830-
The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
831-
functionality for grouping by some criterion, applying, and combining the
832-
results into a Series, DataFrame, etc.
926+
Custom describe
927+
+++++++++++++++
833928

834-
.. _Elementwise:
929+
With ``.agg()`` is it possible to easily create a custom describe function, similar
930+
to the built in :ref:`describe function <basics.describe>`.
835931

836-
Applying elementwise Python functions
837-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
932+
.. ipython:: python
933+
934+
from functools import partial
935+
936+
q_25 = partial(pd.Series.quantile, q=0.25)
937+
q_25.__name__ = '25%'
938+
q_75 = partial(pd.Series.quantile, q=0.75)
939+
q_75.__name__ = '75%'
940+
941+
tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max'])
942+
943+
.. _basics.transform:
944+
945+
Transform API
946+
~~~~~~~~~~~~~
947+
948+
.. versionadded:: 0.20.0
949+
950+
The ``transform`` method returns an object that is indexed the same (same size)
951+
as the original. This API allows you to provide *multiple* operations at the same
952+
time rather than one-by-one. Its api is quite similar to the ``.agg`` API.
953+
954+
Use a similar frame to the above sections.
955+
956+
.. ipython:: python
957+
958+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
959+
index=pd.date_range('1/1/2000', periods=10))
960+
tsdf.iloc[3:7] = np.nan
961+
tsdf
962+
963+
Transform the entire frame. Transform allows functions to input as a numpy function, string
964+
function name and user defined function.
965+
966+
.. ipython:: python
967+
968+
tsdf.transform(np.abs)
969+
tsdf.transform('abs')
970+
tsdf.transform(lambda x: x.abs())
971+
972+
``.transform()`` with a single function is equivalent to applying a function across the
973+
columns.
974+
975+
.. ipython:: python
976+
977+
tsdf.apply(np.abs, axis=1)
978+
979+
Passing a single function to ``.transform()`` with a Series will yield a single Series in return.
980+
981+
.. ipython:: python
982+
983+
tsdf.A.transform(np.abs)
984+
985+
986+
Transform with multiple functions
987+
+++++++++++++++++++++++++++++++++
988+
989+
Passing multiple functions will yield a column multi-indexed DataFrame.
990+
The first level will be the original frame column names; the second level
991+
will be the names of the transforming functions.
992+
993+
.. ipython:: python
994+
995+
tsdf.transform([np.abs, lambda x: x+1])
996+
997+
Passing multiple functions to a Series will yield a DataFrame. The
998+
resulting column names will be the transforming functions.
999+
1000+
.. ipython:: python
1001+
1002+
tsdf.A.transform([np.abs, lambda x: x+1])
1003+
1004+
1005+
Transforming with a dict of functions
1006+
+++++++++++++++++++++++++++++++++++++
1007+
1008+
1009+
Passing a dict of functions will will allow selective transforming per column.
1010+
1011+
.. ipython:: python
1012+
1013+
tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
1014+
1015+
Passing a dict of lists will generate a multi-indexed DataFrame with these
1016+
selective transforms.
1017+
1018+
.. ipython:: python
1019+
1020+
tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})
1021+
1022+
On a Series, passing a dict allows renaming as in ``.agg()``
1023+
1024+
.. ipython:: python
1025+
1026+
tsdf.A.transform({'foo': np.abs})
1027+
tsdf.A.transform({'foo': np.abs, 'bar': [lambda x: x+1, 'sqrt']})
1028+
1029+
1030+
.. _basics.elementwise:
1031+
1032+
Applying Elementwise Functions
1033+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8381034

8391035
Since not all functions can be vectorized (accept NumPy arrays and return
8401036
another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame

doc/source/computation.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -560,7 +560,9 @@ Aggregation
560560
-----------
561561

562562
Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to
563-
perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here <groupby.aggregate>`.
563+
perform multiple computations on the data. These operations are similar to the :ref:`aggregating API <basics.aggregate>`,
564+
:ref:`groupby aggregates <groupby.aggregate>`, and :ref:`resample API <timeseries.aggregate>`.
565+
564566

565567
.. ipython:: python
566568

doc/source/groupby.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -380,7 +380,9 @@ Aggregation
380380
-----------
381381

382382
Once the GroupBy object has been created, several methods are available to
383-
perform a computation on the grouped data.
383+
perform a computation on the grouped data. These operations are similar to the
384+
:ref:`aggregating API <basics.aggregate>`, :ref:`window functions <stats.aggregate>`,
385+
and :ref:`resample API <timeseries.aggregate>`.
384386

385387
An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method:
386388

doc/source/timeseries.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -1404,11 +1404,13 @@ We can instead only resample those groups where we have points as follows:
14041404
14051405
ts.groupby(partial(round, freq='3T')).sum()
14061406
1407+
.. _timeseries.aggregate:
1408+
14071409
Aggregation
14081410
~~~~~~~~~~~
14091411

1410-
Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
1411-
resampled.
1412+
Similar to the :ref:`aggregating API <basics.aggregate>`, :ref:`groupby aggregates <groupby.aggregate>`, and :ref:`window functions <stats.aggregate>`,
1413+
a ``Resampler`` can be selectively resampled.
14121414

14131415
Resampling a ``DataFrame``, the default will be to act on all columns with the same function.
14141416

0 commit comments

Comments
 (0)