Skip to content

Commit 8b40453

Browse files
authored
ENH: add Series & DataFrame .agg/.aggregate (#14668)
* ENH: add Series & DataFrame .agg/.aggregate to provide convienent function application that mimics the groupby(..).agg/.aggregate interface .apply is now a synonym for .agg, and will accept dict/list-likes for aggregations CLN: rename .name attr -> ._selection_name from SeriesGroupby for compat (didn't exist on DataFrameGroupBy) resolves conflicts w.r.t. setting .name on a groupby object closes #1623 closes #14464 custom .describe closes #14483 closes #15015 closes #7014 * DOC/TST: test for deprecation in .agg additional doc updates * whatsnew fixes
1 parent 9991579 commit 8b40453

File tree

14 files changed

+877
-45
lines changed

14 files changed

+877
-45
lines changed

doc/source/api.rst

+4
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,8 @@ Function application, GroupBy & Window
314314
:toctree: generated/
315315

316316
Series.apply
317+
Series.aggregate
318+
Series.transform
317319
Series.map
318320
Series.groupby
319321
Series.rolling
@@ -831,6 +833,8 @@ Function application, GroupBy & Window
831833

832834
DataFrame.apply
833835
DataFrame.applymap
836+
DataFrame.aggregate
837+
DataFrame.transform
834838
DataFrame.groupby
835839
DataFrame.rolling
836840
DataFrame.expanding

doc/source/basics.rst

+223-8
Original file line numberDiff line numberDiff line change
@@ -702,7 +702,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
702702

703703
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
704704
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
705-
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
705+
3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`
706+
4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap`
706707

707708
.. _basics.pipe:
708709

@@ -778,6 +779,13 @@ statistics methods, take an optional ``axis`` argument:
778779
df.apply(np.cumsum)
779780
df.apply(np.exp)
780781
782+
``.apply()`` will also dispatch on a string method name.
783+
784+
.. ipython:: python
785+
786+
df.apply('mean')
787+
df.apply('mean', axis=1)
788+
781789
Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
782790
the result will either be of lower dimension or the same dimension.
783791

@@ -827,16 +835,223 @@ set to True, the passed function will instead receive an ndarray object, which
827835
has positive performance implications if you do not need the indexing
828836
functionality.
829837

830-
.. seealso::
838+
.. _basics.aggregate:
839+
840+
Aggregation API
841+
~~~~~~~~~~~~~~~
842+
843+
.. versionadded:: 0.20.0
844+
845+
The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
846+
This API is similar across pandas objects, see :ref:`groupby API <groupby.aggregate>`, the
847+
:ref:`window functions API <stats.aggregate>`, and the :ref:`resample API <timeseries.aggregate>`.
848+
The entry point for aggregation is the method :meth:`~DataFrame.aggregate`, or the alias :meth:`~DataFrame.agg`.
849+
850+
We will use a similar starting frame from above:
851+
852+
.. ipython:: python
853+
854+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
855+
index=pd.date_range('1/1/2000', periods=10))
856+
tsdf.iloc[3:7] = np.nan
857+
tsdf
858+
859+
Using a single function is equivalent to :meth:`~DataFrame.apply`; You can also pass named methods as strings.
860+
These will return a ``Series`` of the aggregated output:
861+
862+
.. ipython:: python
863+
864+
tsdf.agg(np.sum)
865+
866+
tsdf.agg('sum')
867+
868+
# these are equivalent to a ``.sum()`` because we are aggregating on a single function
869+
tsdf.sum()
870+
871+
Single aggregations on a ``Series`` this will result in a scalar value:
872+
873+
.. ipython:: python
874+
875+
tsdf.A.agg('sum')
876+
877+
878+
Aggregating with multiple functions
879+
+++++++++++++++++++++++++++++++++++
880+
881+
You can pass multiple aggregation arguments as a list.
882+
The results of each of the passed functions will be a row in the resultant ``DataFrame``.
883+
These are naturally named from the aggregation function.
884+
885+
.. ipython:: python
886+
887+
tsdf.agg(['sum'])
888+
889+
Multiple functions yield multiple rows:
890+
891+
.. ipython:: python
892+
893+
tsdf.agg(['sum', 'mean'])
894+
895+
On a ``Series``, multiple functions return a ``Series``, indexed by the function names:
896+
897+
.. ipython:: python
898+
899+
tsdf.A.agg(['sum', 'mean'])
900+
901+
Passing a ``lambda`` function will yield a ``<lambda>`` named row:
902+
903+
.. ipython:: python
904+
905+
tsdf.A.agg(['sum', lambda x: x.mean()])
906+
907+
Passing a named function will yield that name for the row:
908+
909+
.. ipython:: python
910+
911+
def mymean(x):
912+
return x.mean()
913+
914+
tsdf.A.agg(['sum', mymean])
915+
916+
Aggregating with a dict
917+
+++++++++++++++++++++++
918+
919+
Passing a dictionary of column names to a scalar or a list of scalars, to ``DataFame.agg``
920+
allows you to customize which functions are applied to which columns.
921+
922+
.. ipython:: python
923+
924+
tsdf.agg({'A': 'mean', 'B': 'sum'})
925+
926+
Passing a list-like will generate a ``DataFrame`` output. You will get a matrix-like output
927+
of all of the aggregators. The output will consist of all unique functions. Those that are
928+
not noted for a particular column will be ``NaN``:
929+
930+
.. ipython:: python
931+
932+
tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})
933+
934+
.. _basics.aggregation.mixed_dtypes:
935+
936+
Mixed Dtypes
937+
++++++++++++
831938

832-
The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
833-
functionality for grouping by some criterion, applying, and combining the
834-
results into a Series, DataFrame, etc.
939+
When presented with mixed dtypes that cannot aggregate, ``.agg`` will only take the valid
940+
aggregations. This is similiar to how groupby ``.agg`` works.
835941

836-
.. _Elementwise:
942+
.. ipython:: python
943+
944+
mdf = pd.DataFrame({'A': [1, 2, 3],
945+
'B': [1., 2., 3.],
946+
'C': ['foo', 'bar', 'baz'],
947+
'D': pd.date_range('20130101', periods=3)})
948+
mdf.dtypes
949+
950+
.. ipython:: python
951+
952+
mdf.agg(['min', 'sum'])
953+
954+
.. _basics.aggregation.custom_describe:
955+
956+
Custom describe
957+
+++++++++++++++
958+
959+
With ``.agg()`` is it possible to easily create a custom describe function, similar
960+
to the built in :ref:`describe function <basics.describe>`.
961+
962+
.. ipython:: python
963+
964+
from functools import partial
965+
966+
q_25 = partial(pd.Series.quantile, q=0.25)
967+
q_25.__name__ = '25%'
968+
q_75 = partial(pd.Series.quantile, q=0.75)
969+
q_75.__name__ = '75%'
970+
971+
tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max'])
972+
973+
.. _basics.transform:
974+
975+
Transform API
976+
~~~~~~~~~~~~~
837977

838-
Applying elementwise Python functions
839-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
978+
.. versionadded:: 0.20.0
979+
980+
The :method:`~DataFrame.transform` method returns an object that is indexed the same (same size)
981+
as the original. This API allows you to provide *multiple* operations at the same
982+
time rather than one-by-one. Its api is quite similar to the ``.agg`` API.
983+
984+
Use a similar frame to the above sections.
985+
986+
.. ipython:: python
987+
988+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
989+
index=pd.date_range('1/1/2000', periods=10))
990+
tsdf.iloc[3:7] = np.nan
991+
tsdf
992+
993+
Transform the entire frame. Transform allows functions to input as a numpy function, string
994+
function name and user defined function.
995+
996+
.. ipython:: python
997+
998+
tsdf.transform(np.abs)
999+
tsdf.transform('abs')
1000+
tsdf.transform(lambda x: x.abs())
1001+
1002+
Since this is a single function, this is equivalent to a ufunc application
1003+
1004+
.. ipython:: python
1005+
1006+
np.abs(tsdf)
1007+
1008+
Passing a single function to ``.transform()`` with a Series will yield a single Series in return.
1009+
1010+
.. ipython:: python
1011+
1012+
tsdf.A.transform(np.abs)
1013+
1014+
1015+
Transform with multiple functions
1016+
+++++++++++++++++++++++++++++++++
1017+
1018+
Passing multiple functions will yield a column multi-indexed DataFrame.
1019+
The first level will be the original frame column names; the second level
1020+
will be the names of the transforming functions.
1021+
1022+
.. ipython:: python
1023+
1024+
tsdf.transform([np.abs, lambda x: x+1])
1025+
1026+
Passing multiple functions to a Series will yield a DataFrame. The
1027+
resulting column names will be the transforming functions.
1028+
1029+
.. ipython:: python
1030+
1031+
tsdf.A.transform([np.abs, lambda x: x+1])
1032+
1033+
1034+
Transforming with a dict
1035+
++++++++++++++++++++++++
1036+
1037+
1038+
Passing a dict of functions will will allow selective transforming per column.
1039+
1040+
.. ipython:: python
1041+
1042+
tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
1043+
1044+
Passing a dict of lists will generate a multi-indexed DataFrame with these
1045+
selective transforms.
1046+
1047+
.. ipython:: python
1048+
1049+
tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})
1050+
1051+
.. _basics.elementwise:
1052+
1053+
Applying Elementwise Functions
1054+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8401055

8411056
Since not all functions can be vectorized (accept NumPy arrays and return
8421057
another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame

doc/source/computation.rst

+7-5
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,9 @@ Aggregation
617617
-----------
618618

619619
Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to
620-
perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here <groupby.aggregate>`.
620+
perform multiple computations on the data. These operations are similar to the :ref:`aggregating API <basics.aggregate>`,
621+
:ref:`groupby aggregates <groupby.aggregate>`, and :ref:`resample API <timeseries.aggregate>`.
622+
621623

622624
.. ipython:: python
623625
@@ -642,10 +644,10 @@ columns if none are selected.
642644

643645
.. _stats.aggregate.multifunc:
644646

645-
Applying multiple functions at once
646-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
647+
Applying multiple functions
648+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
647649

648-
With windowed Series you can also pass a list or dict of functions to do
650+
With windowed ``Series`` you can also pass a list of functions to do
649651
aggregation with, outputting a DataFrame:
650652

651653
.. ipython:: python
@@ -666,7 +668,7 @@ Applying different functions to DataFrame columns
666668
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
667669

668670
By passing a dict to ``aggregate`` you can apply a different aggregation to the
669-
columns of a DataFrame:
671+
columns of a ``DataFrame``:
670672

671673
.. ipython:: python
672674
:okexcept:

doc/source/groupby.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,9 @@ Aggregation
439439
-----------
440440

441441
Once the GroupBy object has been created, several methods are available to
442-
perform a computation on the grouped data.
442+
perform a computation on the grouped data. These operations are similar to the
443+
:ref:`aggregating API <basics.aggregate>`, :ref:`window functions API <stats.aggregate>`,
444+
and :ref:`resample API <timeseries.aggregate>`.
443445

444446
An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method:
445447

doc/source/timeseries.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -1519,11 +1519,13 @@ We can instead only resample those groups where we have points as follows:
15191519
15201520
ts.groupby(partial(round, freq='3T')).sum()
15211521
1522+
.. _timeseries.aggregate:
1523+
15221524
Aggregation
15231525
~~~~~~~~~~~
15241526

1525-
Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
1526-
resampled.
1527+
Similar to the :ref:`aggregating API <basics.aggregate>`, :ref:`groupby aggregates API <groupby.aggregate>`, and the :ref:`window functions API <stats.aggregate>`,
1528+
a ``Resampler`` can be selectively resampled.
15271529

15281530
Resampling a ``DataFrame``, the default will be to act on all columns with the same function.
15291531

0 commit comments

Comments
 (0)