Skip to content

ENH: add Series & DataFrame .agg/.aggregate #14668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 14, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,8 @@ Function application, GroupBy & Window
:toctree: generated/

Series.apply
Series.aggregate
Series.transform
Series.map
Series.groupby
Series.rolling
Expand Down Expand Up @@ -831,6 +833,8 @@ Function application, GroupBy & Window

DataFrame.apply
DataFrame.applymap
DataFrame.aggregate
DataFrame.transform
DataFrame.groupby
DataFrame.rolling
DataFrame.expanding
Expand Down
231 changes: 223 additions & 8 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -702,7 +702,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.

1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`
4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap`

.. _basics.pipe:

Expand Down Expand Up @@ -778,6 +779,13 @@ statistics methods, take an optional ``axis`` argument:
df.apply(np.cumsum)
df.apply(np.exp)

``.apply()`` will also dispatch on a string method name.

.. ipython:: python

df.apply('mean')
df.apply('mean', axis=1)

Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
the result will either be of lower dimension or the same dimension.

Expand Down Expand Up @@ -827,16 +835,223 @@ set to True, the passed function will instead receive an ndarray object, which
has positive performance implications if you do not need the indexing
functionality.

.. seealso::
.. _basics.aggregate:

Aggregation API
~~~~~~~~~~~~~~~

.. versionadded:: 0.20.0

The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
This API is similar across pandas objects, see :ref:`groupby API <groupby.aggregate>`, the
:ref:`window functions API <stats.aggregate>`, and the :ref:`resample API <timeseries.aggregate>`.
The entry point for aggregation is the method :meth:`~DataFrame.aggregate`, or the alias :meth:`~DataFrame.agg`.

We will use a similar starting frame from above:

.. ipython:: python

tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
index=pd.date_range('1/1/2000', periods=10))
tsdf.iloc[3:7] = np.nan
tsdf

Using a single function is equivalent to :meth:`~DataFrame.apply`; You can also pass named methods as strings.
These will return a ``Series`` of the aggregated output:

.. ipython:: python

tsdf.agg(np.sum)

tsdf.agg('sum')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe show here that this is the same as tsdf.sum() ?


# these are equivalent to a ``.sum()`` because we are aggregating on a single function
tsdf.sum()

Single aggregations on a ``Series`` this will result in a scalar value:

.. ipython:: python

tsdf.A.agg('sum')


Aggregating with multiple functions
+++++++++++++++++++++++++++++++++++

You can pass multiple aggregation arguments as a list.
The results of each of the passed functions will be a row in the resultant ``DataFrame``.
These are naturally named from the aggregation function.

.. ipython:: python

tsdf.agg(['sum'])

Multiple functions yield multiple rows:

.. ipython:: python

tsdf.agg(['sum', 'mean'])

On a ``Series``, multiple functions return a ``Series``, indexed by the function names:

.. ipython:: python

tsdf.A.agg(['sum', 'mean'])

Passing a ``lambda`` function will yield a ``<lambda>`` named row:

.. ipython:: python

tsdf.A.agg(['sum', lambda x: x.mean()])

Passing a named function will yield that name for the row:

.. ipython:: python

def mymean(x):
return x.mean()

tsdf.A.agg(['sum', mymean])

Aggregating with a dict
+++++++++++++++++++++++

Passing a dictionary of column names to a scalar or a list of scalars, to ``DataFame.agg``
allows you to customize which functions are applied to which columns.

.. ipython:: python

tsdf.agg({'A': 'mean', 'B': 'sum'})

Passing a list-like will generate a ``DataFrame`` output. You will get a matrix-like output
of all of the aggregators. The output will consist of all unique functions. Those that are
not noted for a particular column will be ``NaN``:

.. ipython:: python

tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})

.. _basics.aggregation.mixed_dtypes:

Mixed Dtypes
++++++++++++

The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
functionality for grouping by some criterion, applying, and combining the
results into a Series, DataFrame, etc.
When presented with mixed dtypes that cannot aggregate, ``.agg`` will only take the valid
aggregations. This is similiar to how groupby ``.agg`` works.

.. _Elementwise:
.. ipython:: python

mdf = pd.DataFrame({'A': [1, 2, 3],
'B': [1., 2., 3.],
'C': ['foo', 'bar', 'baz'],
'D': pd.date_range('20130101', periods=3)})
mdf.dtypes

.. ipython:: python

mdf.agg(['min', 'sum'])

.. _basics.aggregation.custom_describe:

Custom describe
+++++++++++++++

With ``.agg()`` is it possible to easily create a custom describe function, similar
to the built in :ref:`describe function <basics.describe>`.

.. ipython:: python

from functools import partial

q_25 = partial(pd.Series.quantile, q=0.25)
q_25.__name__ = '25%'
q_75 = partial(pd.Series.quantile, q=0.75)
q_75.__name__ = '75%'

tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max'])

.. _basics.transform:

Transform API
~~~~~~~~~~~~~

Applying elementwise Python functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. versionadded:: 0.20.0

The :method:`~DataFrame.transform` method returns an object that is indexed the same (same size)
as the original. This API allows you to provide *multiple* operations at the same
time rather than one-by-one. Its api is quite similar to the ``.agg`` API.

Use a similar frame to the above sections.

.. ipython:: python

tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
index=pd.date_range('1/1/2000', periods=10))
tsdf.iloc[3:7] = np.nan
tsdf

Transform the entire frame. Transform allows functions to input as a numpy function, string
function name and user defined function.

.. ipython:: python

tsdf.transform(np.abs)
tsdf.transform('abs')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'abs' does actually not seem to work for groupby.transform

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah abs is not generally defined (maybe create a separate issue for that)

tsdf.transform(lambda x: x.abs())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above: maybe show here that this is the same as np.abs in this case


Since this is a single function, this is equivalent to a ufunc application

.. ipython:: python

np.abs(tsdf)

Passing a single function to ``.transform()`` with a Series will yield a single Series in return.

.. ipython:: python

tsdf.A.transform(np.abs)


Transform with multiple functions
+++++++++++++++++++++++++++++++++

Passing multiple functions will yield a column multi-indexed DataFrame.
The first level will be the original frame column names; the second level
will be the names of the transforming functions.

.. ipython:: python

tsdf.transform([np.abs, lambda x: x+1])

Passing multiple functions to a Series will yield a DataFrame. The
resulting column names will be the transforming functions.

.. ipython:: python

tsdf.A.transform([np.abs, lambda x: x+1])


Transforming with a dict
++++++++++++++++++++++++


Passing a dict of functions will will allow selective transforming per column.

.. ipython:: python

tsdf.transform({'A': np.abs, 'B': lambda x: x+1})

Passing a dict of lists will generate a multi-indexed DataFrame with these
selective transforms.

.. ipython:: python

tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})

.. _basics.elementwise:

Applying Elementwise Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since not all functions can be vectorized (accept NumPy arrays and return
another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame
Expand Down
12 changes: 7 additions & 5 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,9 @@ Aggregation
-----------

Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to
perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here <groupby.aggregate>`.
perform multiple computations on the data. These operations are similar to the :ref:`aggregating API <basics.aggregate>`,
:ref:`groupby aggregates <groupby.aggregate>`, and :ref:`resample API <timeseries.aggregate>`.


.. ipython:: python

Expand All @@ -642,10 +644,10 @@ columns if none are selected.

.. _stats.aggregate.multifunc:

Applying multiple functions at once
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Applying multiple functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~

With windowed Series you can also pass a list or dict of functions to do
With windowed ``Series`` you can also pass a list of functions to do
aggregation with, outputting a DataFrame:

.. ipython:: python
Expand All @@ -666,7 +668,7 @@ Applying different functions to DataFrame columns
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By passing a dict to ``aggregate`` you can apply a different aggregation to the
columns of a DataFrame:
columns of a ``DataFrame``:

.. ipython:: python
:okexcept:
Expand Down
4 changes: 3 additions & 1 deletion doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,9 @@ Aggregation
-----------

Once the GroupBy object has been created, several methods are available to
perform a computation on the grouped data.
perform a computation on the grouped data. These operations are similar to the
:ref:`aggregating API <basics.aggregate>`, :ref:`window functions API <stats.aggregate>`,
and :ref:`resample API <timeseries.aggregate>`.

An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method:

Expand Down
6 changes: 4 additions & 2 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1519,11 +1519,13 @@ We can instead only resample those groups where we have points as follows:

ts.groupby(partial(round, freq='3T')).sum()

.. _timeseries.aggregate:

Aggregation
~~~~~~~~~~~

Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
resampled.
Similar to the :ref:`aggregating API <basics.aggregate>`, :ref:`groupby aggregates API <groupby.aggregate>`, and the :ref:`window functions API <stats.aggregate>`,
a ``Resampler`` can be selectively resampled.

Resampling a ``DataFrame``, the default will be to act on all columns with the same function.

Expand Down
Loading