Skip to content

Commit e4f0850

Browse files
committed
ENH: add Series & DataFrame .agg/.aggregate to provide convienent
function application that mimics the groupby(..).agg/.aggregate interface .apply is now a synonym for .agg, and will accept dict/list-likes for aggregations CLN: rename .name attr -> ._selection_name from SeriesGroupby for compat (didn't exist on DataFrameGroupBy) resolves conflicts w.r.t. setting .name on a groupby object closes pandas-dev#1623 closes pandas-dev#14464 custom .describe closes pandas-dev#14483 closes pandas-dev#7014
1 parent b1d9599 commit e4f0850

File tree

13 files changed

+787
-57
lines changed

13 files changed

+787
-57
lines changed

doc/source/basics.rst

+186-8
Original file line numberDiff line numberDiff line change
@@ -700,7 +700,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
700700

701701
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
702702
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
703-
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
703+
3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`
704+
4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap`
704705

705706
.. _basics.pipe:
706707

@@ -776,6 +777,13 @@ statistics methods, take an optional ``axis`` argument:
776777
df.apply(np.cumsum)
777778
df.apply(np.exp)
778779
780+
``.apply()`` will also dispatch on a string method name.
781+
782+
.. ipython:: python
783+
784+
df.apply('mean')
785+
df.apply('mean', axis=1)
786+
779787
Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
780788
the result will either be of lower dimension or the same dimension.
781789

@@ -825,16 +833,186 @@ set to True, the passed function will instead receive an ndarray object, which
825833
has positive performance implications if you do not need the indexing
826834
functionality.
827835

828-
.. seealso::
836+
.. _basics.aggregate:
837+
838+
Aggregation API
839+
~~~~~~~~~~~~~~~
840+
841+
.. versionadded:: 0.20.0
842+
843+
The aggregation APi allows one to express possibly multiple aggregation operations in a single concise way.
844+
This API is similar across pandas objects, :ref:`groupby aggregates <groupby.aggregate>`,
845+
:ref:`window functions <stats.aggregate>`, and the :ref:`resample API <timeseries.aggregate>`.
846+
847+
We will use a similar starting frame from above.
848+
849+
.. ipython:: python
850+
851+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
852+
index=pd.date_range('1/1/2000', periods=10))
853+
tsdf.iloc[3:7] = np.nan
854+
tsdf
855+
856+
Using a single function is equivalent to ``.apply``; You can also pass named methods as strings.
857+
This will return a Series of the output.
858+
859+
.. ipython:: python
860+
861+
tsdf.agg(np.sum)
862+
863+
tsdf.agg('sum')
864+
865+
On a Series this will result in a scalar value
866+
867+
.. ipython:: python
868+
869+
tsdf.A.agg('sum')
870+
871+
872+
Aggregating multiple functions at once
873+
++++++++++++++++++++++++++++++++++++++
874+
875+
You can pass arguments as a list. The results of each of the passed functions will be a row in the resultant DataFrame.
876+
These are naturally named from the aggregation function.
877+
878+
.. ipython:: python
879+
880+
tsdf.agg(['sum'])
881+
882+
Multiple functions yield multiple rows.
883+
884+
.. ipython:: python
885+
886+
tsdf.agg(['sum', 'mean'])
887+
888+
On a Series, multiple functions return a Series.
889+
890+
.. ipython:: python
891+
892+
tsdf.A.agg(['sum', 'mean'])
893+
894+
895+
Aggregating with a dict of functions
896+
++++++++++++++++++++++++++++++++++++
897+
898+
Passing a dictionary of column name to function or list of functions, to ``DataFame.agg``
899+
allows you to customize which functions are applied to which columns.
900+
901+
.. ipython:: python
902+
903+
tsdf.agg({'A': 'mean', 'B': 'sum'})
904+
905+
Passing a list-like will generate a DataFrame output. You will get a matrix-like output
906+
of all of the aggregators; some may be missing values.
907+
908+
.. ipython:: python
909+
910+
tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'})
911+
912+
For a Series, you can pass a dict; the keys will set the name of the column
913+
914+
.. ipython:: python
915+
916+
tsdf.A.agg({'foo' : ['sum', 'mean']})
917+
918+
Multiple keys will yield multiple columns.
919+
920+
.. ipython:: python
921+
922+
tsdf.A.agg({'foo' : ['sum', 'mean'], 'bar': ['min', 'max', lambda x: x.sum()+1]})
923+
924+
925+
.. _basics.transform:
926+
927+
Transform API
928+
~~~~~~~~~~~~~
929+
930+
.. versionadded:: 0.20.0
931+
932+
The ``transform`` method returns an object that is indexed the same (same size)
933+
as the original. This API allows you to provide *multiple* operations at the same
934+
time rather than one-by-one. Its api is quite similar to the ``.agg`` API.
935+
936+
Use a similar frame to the above sections.
937+
938+
.. ipython:: python
939+
940+
tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
941+
index=pd.date_range('1/1/2000', periods=10))
942+
tsdf.iloc[3:7] = np.nan
943+
tsdf
944+
945+
Transform the entire frame. Transform allows functions to input as a numpy function, string
946+
function name and user defined function.
947+
948+
.. ipython:: python
949+
950+
tsdf.transform(np.abs)
951+
tsdf.transform('abs')
952+
tsdf.transform(lambda x: x.abs())
953+
954+
``.transform()`` with a single function is equivalent to applying a function across the
955+
columns.
956+
957+
.. ipython:: python
958+
959+
tsdf.apply(np.abs, axis=1)
960+
961+
Passing a single function to ``.transform()`` with a Series will yield a single Series in return.
962+
963+
.. ipython:: python
964+
965+
tsdf.A.transform(np.abs)
829966
830-
The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
831-
functionality for grouping by some criterion, applying, and combining the
832-
results into a Series, DataFrame, etc.
833967
834-
.. _Elementwise:
968+
Transform with multiple functions
969+
+++++++++++++++++++++++++++++++++
835970

836-
Applying elementwise Python functions
837-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
971+
Passing multiple functions will yield a column multi-indexed DataFrame.
972+
The first level will be the original frame column names; the second level
973+
will be the names of the transforming functions.
974+
975+
.. ipython:: python
976+
977+
tsdf.transform([np.abs, lambda x: x+1])
978+
979+
Passing multiple functions to a Series will yield a DataFrame. The
980+
resulting column names will be the transforming functions.
981+
982+
.. ipython:: python
983+
984+
tsdf.A.transform([np.abs, lambda x: x+1])
985+
986+
987+
Transforming with a dict of functions
988+
+++++++++++++++++++++++++++++++++++++
989+
990+
991+
Passing a dict of functions will will allow selective transforming per column.
992+
993+
.. ipython:: python
994+
995+
tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
996+
997+
Passing a dict of lists will generate a multi-indexed DataFrame with these
998+
selective transforms.
999+
1000+
.. ipython:: python
1001+
1002+
tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})
1003+
1004+
On a Series, passing a dict allows renaming as in ``.agg()``
1005+
1006+
.. ipython:: python
1007+
1008+
tsdf.A.transform({'foo': np.abs})
1009+
tsdf.A.transform({'foo': np.abs, 'bar': [lambda x: x+1, 'sqrt']})
1010+
1011+
1012+
.. _basics.elementwise:
1013+
1014+
Applying Elementwise Functions
1015+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8381016

8391017
Since not all functions can be vectorized (accept NumPy arrays and return
8401018
another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame

doc/source/computation.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -560,7 +560,9 @@ Aggregation
560560
-----------
561561

562562
Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to
563-
perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here <groupby.aggregate>`.
563+
perform multiple computations on the data. These operations are similar to the :ref:`aggregating API <basics.aggregate>`,
564+
:ref:`groupby aggregates <groupby.aggregate>`, and :ref:`resample API <timeseries.aggregate>`.
565+
564566

565567
.. ipython:: python
566568

doc/source/groupby.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -380,7 +380,9 @@ Aggregation
380380
-----------
381381

382382
Once the GroupBy object has been created, several methods are available to
383-
perform a computation on the grouped data.
383+
perform a computation on the grouped data. These operations are similar to the
384+
:ref:`aggregating API <basics.aggregate>`, :ref:`window functions <stats.aggregate>`,
385+
and :ref:`resample API <timeseries.aggregate>`.
384386

385387
An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method:
386388

doc/source/timeseries.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -1404,11 +1404,13 @@ We can instead only resample those groups where we have points as follows:
14041404
14051405
ts.groupby(partial(round, freq='3T')).sum()
14061406
1407+
.. _timeseries.aggregate:
1408+
14071409
Aggregation
14081410
~~~~~~~~~~~
14091411

1410-
Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
1411-
resampled.
1412+
Similar to the :ref:`aggregating API <basics.aggregate>`, :ref:`groupby aggregates <groupby.aggregate>`, and :ref:`window functions <stats.aggregate>`,
1413+
a ``Resampler`` can be selectively resampled.
14121414

14131415
Resampling a ``DataFrame``, the default will be to act on all columns with the same function.
14141416

pandas/core/base.py

+57-13
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,9 @@ class SelectionMixin(object):
289289
}
290290

291291
@property
292-
def name(self):
292+
def _selection_name(self):
293+
""" return a name for myself; this would ideally be the 'name' property, but
294+
we cannot conflict with the Series.name property which can be set """
293295
if self._selection is None:
294296
return None # 'result'
295297
else:
@@ -404,6 +406,26 @@ def aggregate(self, func, *args, **kwargs):
404406

405407
agg = aggregate
406408

409+
def _try_aggregate_string_function(self, arg, *args, **kwargs):
410+
"""
411+
if arg is a string, then try to operate on it:
412+
- try to find a function on ourselves
413+
- try to find a numpy function
414+
- raise
415+
416+
"""
417+
assert isinstance(arg, compat.string_types)
418+
419+
f = getattr(self, arg, None)
420+
if f is not None:
421+
return f(*args, **kwargs)
422+
423+
f = getattr(np, arg, None)
424+
if f is not None:
425+
return f(self, *args, **kwargs)
426+
427+
raise ValueError("{} is an unknown string function".format(arg))
428+
407429
def _aggregate(self, arg, *args, **kwargs):
408430
"""
409431
provide an implementation for the aggregators
@@ -427,14 +449,19 @@ def _aggregate(self, arg, *args, **kwargs):
427449
is_aggregator = lambda x: isinstance(x, (list, tuple, dict))
428450
is_nested_renamer = False
429451

452+
_axis = kwargs.pop('_axis', None)
453+
if _axis is None:
454+
_axis = getattr(self, 'axis', 0)
430455
_level = kwargs.pop('_level', None)
456+
431457
if isinstance(arg, compat.string_types):
432-
return getattr(self, arg)(*args, **kwargs), None
458+
return self._try_aggregate_string_function(arg, *args,
459+
**kwargs), None
433460

434461
if isinstance(arg, dict):
435462

436463
# aggregate based on the passed dict
437-
if self.axis != 0: # pragma: no cover
464+
if _axis != 0: # pragma: no cover
438465
raise ValueError('Can only pass dict with axis=0')
439466

440467
obj = self._selected_obj
@@ -560,26 +587,33 @@ def _agg(arg, func):
560587
ABCDataFrame):
561588
result = concat([result[k] for k in keys], keys=keys, axis=1)
562589
else:
563-
from pandas import DataFrame
564-
result = DataFrame(result)
590+
from pandas import DataFrame, Series
591+
try:
592+
result = DataFrame(result)
593+
except ValueError:
594+
# we have a dict of scalars
595+
result = Series(result, name=getattr(self, 'name', None))
565596

566597
return result, True
567-
elif hasattr(arg, '__iter__'):
568-
return self._aggregate_multiple_funcs(arg, _level=_level), None
598+
elif is_list_like(arg) and arg not in compat.string_types:
599+
# we require a list, but not an 'str'
600+
return self._aggregate_multiple_funcs(arg,
601+
_level=_level,
602+
_axis=_axis), None
569603
else:
570604
result = None
571605

572-
cy_func = self._is_cython_func(arg)
573-
if cy_func and not args and not kwargs:
574-
return getattr(self, cy_func)(), None
606+
f = self._is_cython_func(arg)
607+
if f and not args and not kwargs:
608+
return getattr(self, f)(), None
575609

576610
# caller can react
577611
return result, True
578612

579-
def _aggregate_multiple_funcs(self, arg, _level):
613+
def _aggregate_multiple_funcs(self, arg, _level, _axis):
580614
from pandas.tools.merge import concat
581615

582-
if self.axis != 0:
616+
if _axis != 0:
583617
raise NotImplementedError("axis other than 0 is not supported")
584618

585619
if self._selected_obj.ndim == 1:
@@ -617,7 +651,17 @@ def _aggregate_multiple_funcs(self, arg, _level):
617651
except SpecificationError:
618652
raise
619653

620-
return concat(results, keys=keys, axis=1)
654+
try:
655+
return concat(results, keys=keys, axis=1)
656+
except (TypeError, ValueError):
657+
# shape change
658+
from pandas.types.cast import _is_nested_object
659+
from pandas import Series
660+
result = Series(results, index=keys, name=self.name)
661+
if _is_nested_object(result):
662+
raise ValueError("cannot combine transform and "
663+
"aggregation operations")
664+
return result
621665

622666
def _shallow_copy(self, obj=None, obj_type=None, **kwargs):
623667
""" return a new object with the replacement attributes """

0 commit comments

Comments
 (0)