Skip to content

Commit 9173deb

Browse files
author
tp
committed
add GroupBy.pipe method
1 parent 3c964a4 commit 9173deb

File tree

6 files changed

+199
-13
lines changed

6 files changed

+199
-13
lines changed

doc/source/groupby.rst

+50
Original file line numberDiff line numberDiff line change
@@ -1165,6 +1165,56 @@ See the :ref:`visualization documentation<visualization.box>` for more.
11651165
to ``df.boxplot(by="g")``. See :ref:`here<visualization.box.return>` for
11661166
an explanation.
11671167

1168+
.. _groupby.pipe:
1169+
1170+
Piping function calls
1171+
~~~~~~~~~~~~~~~~~~~~~
1172+
1173+
.. versionadded:: 0.21.0
1174+
1175+
Similar to the functionality provided by ``DataFrames`` and ``Series``, functions
1176+
that take ``GroupBy`` objects can be chained together using a ``pipe`` method to
1177+
allow for a cleaner, more readable syntax. To read about ``.pipe`` in general terms,
1178+
see :ref:`here <basics.pipe>`.
1179+
1180+
For a concrete example on combining ``.groupby`` and ``.pipe`` , imagine have a
1181+
DataFrame with columns for stores, products, revenue and sold quantity. We'd like to
1182+
do a groupwise calculation of *prices* (i.e. revenue/quantity per store and per product).
1183+
We could do this in a multi-step operation, but expressing it in terms of piping can make the
1184+
code more readable.
1185+
1186+
First we set the data:
1187+
1188+
.. ipython:: python
1189+
1190+
from numpy.random import choice, random
1191+
n = 1000
1192+
df = pd.DataFrame({'Store': choice(['Store_1', 'Store_2'], n),
1193+
'Product': choice(['Product_1', 'Product_2', 'Product_3'], n),
1194+
'Revenue': (np.random.random(n)*50+10).round(2),
1195+
'Quantity': np.random.randint(1, 10, size=n)})
1196+
df.head(2)
1197+
1198+
Now, to find prices per store/product, we can simply do:
1199+
1200+
.. ipython:: python
1201+
1202+
(df.groupby(['Store', 'Product'])
1203+
.pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
1204+
.unstack().round(2))
1205+
1206+
Piping can also be expressive when you want to deliver a grouped object to some
1207+
arbitrary function, for example:
1208+
1209+
.. code-block:: python
1210+
1211+
(base_df.pipe(lambda x: x[x.A>3])
1212+
.groupby(['Store', 'Product'])
1213+
.pipe(rapport_func)
1214+
1215+
where ``rapport_func`` take an arbitrary GroupBy object and create a rapport
1216+
from that.
1217+
11681218
Examples
11691219
--------
11701220

doc/source/whatsnew/v0.21.0.txt

+3
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,9 @@ Other Enhancements
234234
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names. (:issue:`14207`)
235235
- Improved the import time of pandas by about 2.25x. (:issue:`16764`)
236236
- :func:`read_json` and :func:`to_json` now accept a ``compression`` argument which allows them to transparently handle compressed files. (:issue:`17798`)
237+
- ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series``
238+
that allow for functions that take a ``GroupBy`` to be composed in a clean, readable syntax.
239+
See the :ref:`documentation <groupby.pipe>` for more.
237240

238241
.. _whatsnew_0210.api_breaking:
239242

pandas/core/common.py

+33
Original file line numberDiff line numberDiff line change
@@ -664,3 +664,36 @@ def _get_distinct_objs(objs):
664664
ids.add(id(obj))
665665
res.append(obj)
666666
return res
667+
668+
669+
def _pipe(obj, func, *args, **kwargs):
670+
"""
671+
Apply a function ``func`` to a obj either by passing the obj as the first
672+
argument to the function or, in the case that the func is a tuple,
673+
interpret the first element of the tuple as a function and pass the obj to
674+
that function as a keyword argument whose key is the value of the second
675+
element of the tuple.
676+
677+
func : callable or tuple of (callable, string)
678+
Function to apply to this GroupBy or, alternatively, a
679+
``(callable, data_keyword)`` tuple where ``data_keyword`` is a
680+
string indicating the keyword of `callable`` that expects the
681+
GroupBy object.
682+
args : iterable, optional
683+
positional arguments passed into ``func``.
684+
kwargs : dict, optional
685+
a dictionary of keyword arguments passed into ``func``.
686+
687+
Returns
688+
-------
689+
object : the return type of ``func``.
690+
"""
691+
if isinstance(func, tuple):
692+
func, target = func
693+
if target in kwargs:
694+
msg = '%s is both the pipe target and a keyword argument' % target
695+
raise ValueError(msg)
696+
kwargs[target] = obj
697+
return func(*args, **kwargs)
698+
else:
699+
return func(obj, *args, **kwargs)

pandas/core/generic.py

+6-12
Original file line numberDiff line numberDiff line change
@@ -3482,8 +3482,10 @@ def sample(self, n=None, frac=None, replace=False, weights=None,
34823482
Alternatively a ``(callable, data_keyword)`` tuple where
34833483
``data_keyword`` is a string indicating the keyword of
34843484
``callable`` that expects the %(klass)s.
3485-
args : positional arguments passed into ``func``.
3486-
kwargs : a dictionary of keyword arguments passed into ``func``.
3485+
args : iterable, optional
3486+
positional arguments passed into ``func``.
3487+
kwargs : mapping, optional
3488+
a dictionary of keyword arguments passed into ``func``.
34873489
34883490
Returns
34893491
-------
@@ -3493,7 +3495,7 @@ def sample(self, n=None, frac=None, replace=False, weights=None,
34933495
-----
34943496
34953497
Use ``.pipe`` when chaining together functions that expect
3496-
on Series or DataFrames. Instead of writing
3498+
Series, DataFrames or GroupBys. Instead of writing
34973499
34983500
>>> f(g(h(df), arg1=a), arg2=b, arg3=c)
34993501
@@ -3522,15 +3524,7 @@ def sample(self, n=None, frac=None, replace=False, weights=None,
35223524

35233525
@Appender(_shared_docs['pipe'] % _shared_doc_kwargs)
35243526
def pipe(self, func, *args, **kwargs):
3525-
if isinstance(func, tuple):
3526-
func, target = func
3527-
if target in kwargs:
3528-
raise ValueError('%s is both the pipe target and a keyword '
3529-
'argument' % target)
3530-
kwargs[target] = self
3531-
return func(*args, **kwargs)
3532-
else:
3533-
return func(self, *args, **kwargs)
3527+
return com._pipe(self, func, *args, **kwargs)
35343528

35353529
_shared_docs['aggregate'] = ("""
35363530
Aggregate using callable, string, dict, or list of string/callables

pandas/core/groupby.py

+49-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040

4141
from pandas.core.common import (_values_from_object, AbstractMethodError,
4242
_default_index, _not_none, _get_callable_name,
43-
_asarray_tuplesafe)
43+
_asarray_tuplesafe, _pipe)
4444

4545
from pandas.core.base import (PandasObject, SelectionMixin, GroupByError,
4646
DataError, SpecificationError)
@@ -1691,6 +1691,54 @@ def tail(self, n=5):
16911691
mask = self._cumcount_array(ascending=False) < n
16921692
return self._selected_obj[mask]
16931693

1694+
def pipe(self, func, *args, **kwargs):
1695+
""" Apply a function with arguments to this GroupBy object
1696+
1697+
.. versionadded:: 0.21.0
1698+
1699+
Parameters
1700+
----------
1701+
func : callable or tuple of (callable, string)
1702+
Function to apply to this GroupBy or, alternatively, a
1703+
``(callable, data_keyword)`` tuple where ``data_keyword`` is a
1704+
string indicating the keyword of `callable`` that expects the
1705+
GroupBy object.
1706+
args : iterable, optional
1707+
positional arguments passed into ``func``.
1708+
kwargs : dict, optional
1709+
a dictionary of keyword arguments passed into ``func``.
1710+
1711+
Returns
1712+
-------
1713+
object : the return type of ``func``.
1714+
1715+
Notes
1716+
-----
1717+
Use ``.pipe`` when chaining together functions that expect
1718+
Series, DataFrames or GroupBys. Instead of writing
1719+
1720+
>>> f(g(h(df.groupby('group')), arg1=a), arg2=b, arg3=c)
1721+
1722+
You can write
1723+
1724+
>>> (df
1725+
... .groupby('group')
1726+
... .pipe(f, arg1)
1727+
... .pipe(g, arg2)
1728+
... .pipe(h, arg3))
1729+
1730+
See more :ref:`here
1731+
<http://pandas.pydata.org/pandas-docs/stable/groupby.html#pipe>`
1732+
1733+
See Also
1734+
--------
1735+
pandas.Series.pipe
1736+
pandas.DataFrame.pipe
1737+
pandas.GroupBy.apply
1738+
"""
1739+
return _pipe(self, func, *args, **kwargs)
1740+
1741+
16941742

16951743
GroupBy._add_numeric_operations()
16961744

pandas/tests/groupby/test_groupby.py

+58
Original file line numberDiff line numberDiff line change
@@ -3914,6 +3914,64 @@ def test_gb_key_len_equal_axis_len(self):
39143914
assert df.loc[('foo', 'bar', 'B')] == 2
39153915
assert df.loc[('foo', 'baz', 'C')] == 1
39163916

3917+
def test_pipe(self):
3918+
# Test the pipe method of DataFrameGroupBy.
3919+
# Issue #17871
3920+
3921+
random_state = np.random.RandomState(1234567890)
3922+
3923+
df = DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
3924+
'foo', 'bar', 'foo', 'foo'],
3925+
'B': random_state.randn(8),
3926+
'C': random_state.randn(8)})
3927+
3928+
def f(dfgb):
3929+
return dfgb.B.max() - dfgb.C.min().min()
3930+
3931+
def square(srs):
3932+
return srs ** 2
3933+
3934+
# Note that the transformations are
3935+
# GroupBy -> Series
3936+
# Series -> Series
3937+
# This then chains the GroupBy.pipe and the
3938+
# NDFrame.pipe methods
3939+
result = df.groupby('A').pipe(f).pipe(square)
3940+
3941+
index = Index([u'bar', u'foo'], dtype='object', name=u'A')
3942+
expected = pd.Series([8.99110003361, 8.17516964785], name='B', index=index)
3943+
3944+
assert_series_equal(expected, result)
3945+
3946+
def test_pipe_args(self):
3947+
# Test passing args to the pipe method of DataFrameGroupBy.
3948+
# Issue #17871
3949+
3950+
df = pd.DataFrame({'group': ['A', 'A', 'B', 'B', 'C'],
3951+
'x': [1.0, 2.0, 3.0, 2.0, 5.0],
3952+
'y': [10.0, 100.0, 1000.0, -100.0, -1000.0]})
3953+
3954+
def f(dfgb, arg1):
3955+
return dfgb.filter(lambda grp: grp.y.mean() > arg1, dropna=False).groupby(dfgb.grouper)
3956+
3957+
def g(dfgb, arg2):
3958+
return dfgb.sum() / dfgb.sum().sum() + arg2
3959+
3960+
def h(df, arg3):
3961+
return df.x + df.y - arg3
3962+
3963+
result = (df
3964+
.groupby('group')
3965+
.pipe(f, 0)
3966+
.pipe(g, 10)
3967+
.pipe(h, 100))
3968+
3969+
# Assert the results here
3970+
index = pd.Index(['A', 'B', 'C'], name='group')
3971+
expected = pd.Series([-79.5160891089, -78.4839108911, None], index=index)
3972+
3973+
assert_series_equal(expected, result)
3974+
39173975

39183976
def _check_groupby(df, result, keys, field, f=lambda x: x.sum()):
39193977
tups = lmap(tuple, df[keys].values)

0 commit comments

Comments
 (0)