Skip to content

Commit 41636b0

Browse files
committed
ENH: add Series & DataFrame .agg/.aggregate to provide convienent
function application that mimics the groupby(..).agg/.aggregate interface .apply is now a synonym for .agg, and will accept dict/list-likes for aggregations CLN: rename .name attr -> ._selection_name from SeriesGroupby for compat (didn't exist on DataFrameGroupBy) resolves conflicts w.r.t. setting .name on a groupby object closes pandas-dev#1623 closes pandas-dev#14464 custom .describe closes pandas-dev#14483 closes pandas-dev#15015 closes pandas-dev#7014
1 parent 7ee73ff commit 41636b0

File tree

11 files changed

+850
-27
lines changed

11 files changed

+850
-27
lines changed

doc/source/api.rst

+4
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,8 @@ Function application, GroupBy & Window
314314
:toctree: generated/
315315

316316
Series.apply
317+
Series.aggregate
318+
Series.transform
317319
Series.map
318320
Series.groupby
319321
Series.rolling
@@ -831,6 +833,8 @@ Function application, GroupBy & Window
831833

832834
DataFrame.apply
833835
DataFrame.applymap
836+
DataFrame.aggregate
837+
DataFrame.transform
834838
DataFrame.groupby
835839
DataFrame.rolling
836840
DataFrame.expanding

doc/source/basics.rst

+234-8
Original file line numberDiff line numberDiff line change
@@ -702,7 +702,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
702702

703703
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
704704
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
705-
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
705+
3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`
706+
4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap`
706707

707708
.. _basics.pipe:
708709

@@ -778,6 +779,13 @@ statistics methods, take an optional ``axis`` argument:
778779
df.apply(np.cumsum)
779780
df.apply(np.exp)
780781
782+
``.apply()`` will also dispatch on a string method name.
783+
784+
.. ipython:: python
785+
786+
df.apply('mean')
787+
df.apply('mean', axis=1)
788+
781789
Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
782790
the result will either be of lower dimension or the same dimension.
783791

@@ -827,16 +835,234 @@ set to True, the passed function will instead receive an ndarray object, which
827835
has positive performance implications if you do not need the indexing
828836
functionality.
829837

830-
.. seealso::
838+
.. _basics.aggregate:
839+
840+
Aggregation API
841+
~~~~~~~~~~~~~~~
842+
843+
.. versionadded:: 0.20.0
844+
845+
The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
846+
This API is similar across pandas objects, :ref:`groupby aggregates <groupby.aggregate>`,
847+
:ref:`window functions <stats.aggregate>`, and the :ref:`resample API <timeseries.aggregate>`.
848+
849+
We will use a similar starting frame from above.
850+
851+
.. ipython:: python
852+
853+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
854+
index=pd.date_range('1/1/2000', periods=10))
855+
tsdf.iloc[3:7] = np.nan
856+
tsdf
857+
858+
Using a single function is equivalent to ``.apply``; You can also pass named methods as strings.
859+
This will return a Series of the output.
860+
861+
.. ipython:: python
862+
863+
tsdf.agg(np.sum)
864+
865+
tsdf.agg('sum')
866+
867+
# these are equivalent to a ``.sum()`` because we are aggregating on a single function
868+
tsdf.sum()
869+
870+
On a Series this will result in a scalar value
871+
872+
.. ipython:: python
873+
874+
tsdf.A.agg('sum')
875+
876+
877+
Aggregating multiple functions at once
878+
++++++++++++++++++++++++++++++++++++++
879+
880+
You can pass arguments as a list. The results of each of the passed functions will be a row in the resultant DataFrame.
881+
These are naturally named from the aggregation function.
882+
883+
.. ipython:: python
884+
885+
tsdf.agg(['sum'])
886+
887+
Multiple functions yield multiple rows.
831888

832-
The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
833-
functionality for grouping by some criterion, applying, and combining the
834-
results into a Series, DataFrame, etc.
889+
.. ipython:: python
890+
891+
tsdf.agg(['sum', 'mean'])
892+
893+
On a Series, multiple functions return a Series, indexed by the function names.
894+
895+
.. ipython:: python
896+
897+
tsdf.A.agg(['sum', 'mean'])
898+
899+
900+
Aggregating with a dict of functions
901+
++++++++++++++++++++++++++++++++++++
902+
903+
Passing a dictionary of column name to function or list of functions, to ``DataFame.agg``
904+
allows you to customize which functions are applied to which columns.
905+
906+
.. ipython:: python
907+
908+
tsdf.agg({'A': 'mean', 'B': 'sum'})
909+
910+
Passing a list-like will generate a DataFrame output. You will get a matrix-like output
911+
of all of the aggregators; some may be missing values.
912+
913+
.. ipython:: python
914+
915+
tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})
835916
836-
.. _Elementwise:
917+
For a Series, you can pass a dict. You will get back a MultiIndex Series; The outer level will
918+
be the keys, the inner the name of the functions.
919+
920+
.. ipython:: python
921+
922+
tsdf.A.agg({'foo': ['sum', 'mean']})
923+
924+
Alternatively, using multiple dictionaries, you can have renamed elements with the aggregation
925+
926+
.. ipython:: python
927+
928+
tsdf.A.agg({'foo': 'sum', 'bar': 'mean'})
929+
930+
Multiple keys will yield a MultiIndex Series. The outer level will be the keys, the inner
931+
the names of the functions.
932+
933+
.. ipython:: python
934+
935+
tsdf.A.agg({'foo': ['sum', 'mean'], 'bar': ['min', 'max', lambda x: x.sum()+1]})
936+
937+
.. _basics.aggregation.mixed_dtypes:
938+
939+
Mixed Dtypes
940+
++++++++++++
941+
942+
When presented with mixed dtypes that cannot aggregate, ``.agg`` will only take the valid
943+
aggregations. This is similiar to how groupby ``.agg`` works.
944+
945+
.. ipython:: python
837946
838-
Applying elementwise Python functions
839-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
947+
mdf = pd.DataFrame({'A': [1, 2, 3],
948+
'B': [1., 2., 3.],
949+
'C': ['foo', 'bar', 'baz'],
950+
'D': pd.date_range('20130101', periods=3)})
951+
mdf.dtypes
952+
953+
.. ipython:: python
954+
955+
mdf.agg(['min', 'sum'])
956+
957+
.. _basics.aggregation.custom_describe:
958+
959+
Custom describe
960+
+++++++++++++++
961+
962+
With ``.agg()`` is it possible to easily create a custom describe function, similar
963+
to the built in :ref:`describe function <basics.describe>`.
964+
965+
.. ipython:: python
966+
967+
from functools import partial
968+
969+
q_25 = partial(pd.Series.quantile, q=0.25)
970+
q_25.__name__ = '25%'
971+
q_75 = partial(pd.Series.quantile, q=0.75)
972+
q_75.__name__ = '75%'
973+
974+
tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max'])
975+
976+
.. _basics.transform:
977+
978+
Transform API
979+
~~~~~~~~~~~~~
980+
981+
.. versionadded:: 0.20.0
982+
983+
The ``transform`` method returns an object that is indexed the same (same size)
984+
as the original. This API allows you to provide *multiple* operations at the same
985+
time rather than one-by-one. Its api is quite similar to the ``.agg`` API.
986+
987+
Use a similar frame to the above sections.
988+
989+
.. ipython:: python
990+
991+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
992+
index=pd.date_range('1/1/2000', periods=10))
993+
tsdf.iloc[3:7] = np.nan
994+
tsdf
995+
996+
Transform the entire frame. Transform allows functions to input as a numpy function, string
997+
function name and user defined function.
998+
999+
.. ipython:: python
1000+
1001+
tsdf.transform(np.abs)
1002+
tsdf.transform('abs')
1003+
tsdf.transform(lambda x: x.abs())
1004+
1005+
Since this is a single function, this is equivalent to a ufunc application
1006+
1007+
.. ipython:: python
1008+
1009+
np.abs(tsdf)
1010+
1011+
Passing a single function to ``.transform()`` with a Series will yield a single Series in return.
1012+
1013+
.. ipython:: python
1014+
1015+
tsdf.A.transform(np.abs)
1016+
1017+
1018+
Transform with multiple functions
1019+
+++++++++++++++++++++++++++++++++
1020+
1021+
Passing multiple functions will yield a column multi-indexed DataFrame.
1022+
The first level will be the original frame column names; the second level
1023+
will be the names of the transforming functions.
1024+
1025+
.. ipython:: python
1026+
1027+
tsdf.transform([np.abs, lambda x: x+1])
1028+
1029+
Passing multiple functions to a Series will yield a DataFrame. The
1030+
resulting column names will be the transforming functions.
1031+
1032+
.. ipython:: python
1033+
1034+
tsdf.A.transform([np.abs, lambda x: x+1])
1035+
1036+
1037+
Transforming with a dict of functions
1038+
+++++++++++++++++++++++++++++++++++++
1039+
1040+
1041+
Passing a dict of functions will will allow selective transforming per column.
1042+
1043+
.. ipython:: python
1044+
1045+
tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
1046+
1047+
Passing a dict of lists will generate a multi-indexed DataFrame with these
1048+
selective transforms.
1049+
1050+
.. ipython:: python
1051+
1052+
tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})
1053+
1054+
On a Series, passing a dict allows renaming as in ``.agg()``
1055+
1056+
.. ipython:: python
1057+
1058+
tsdf.A.transform({'foo': np.abs})
1059+
tsdf.A.transform({'foo': np.abs, 'bar': [lambda x: x+1, 'sqrt']})
1060+
1061+
1062+
.. _basics.elementwise:
1063+
1064+
Applying Elementwise Functions
1065+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8401066

8411067
Since not all functions can be vectorized (accept NumPy arrays and return
8421068
another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame

doc/source/computation.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,9 @@ Aggregation
617617
-----------
618618

619619
Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to
620-
perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here <groupby.aggregate>`.
620+
perform multiple computations on the data. These operations are similar to the :ref:`aggregating API <basics.aggregate>`,
621+
:ref:`groupby aggregates <groupby.aggregate>`, and :ref:`resample API <timeseries.aggregate>`.
622+
621623

622624
.. ipython:: python
623625

doc/source/groupby.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,9 @@ Aggregation
439439
-----------
440440

441441
Once the GroupBy object has been created, several methods are available to
442-
perform a computation on the grouped data.
442+
perform a computation on the grouped data. These operations are similar to the
443+
:ref:`aggregating API <basics.aggregate>`, :ref:`window functions <stats.aggregate>`,
444+
and :ref:`resample API <timeseries.aggregate>`.
443445

444446
An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method:
445447

doc/source/timeseries.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -1519,11 +1519,13 @@ We can instead only resample those groups where we have points as follows:
15191519
15201520
ts.groupby(partial(round, freq='3T')).sum()
15211521
1522+
.. _timeseries.aggregate:
1523+
15221524
Aggregation
15231525
~~~~~~~~~~~
15241526

1525-
Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
1526-
resampled.
1527+
Similar to the :ref:`aggregating API <basics.aggregate>`, :ref:`groupby aggregates <groupby.aggregate>`, and :ref:`window functions <stats.aggregate>`,
1528+
a ``Resampler`` can be selectively resampled.
15271529

15281530
Resampling a ``DataFrame``, the default will be to act on all columns with the same function.
15291531

0 commit comments

Comments
 (0)