Skip to content

ENH: allow .rolling / .expanding as groupby methods #12743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions doc/source/whatsnew/v0.18.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Highlights include:

.. _whatsnew_0181.new_features:

- ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see :ref:`here <whatsnew_0181.deferred_ops>`

New features
~~~~~~~~~~~~

Expand Down Expand Up @@ -48,6 +50,55 @@ see :ref:`Custom Business Hour <timeseries.custombusinesshour>` (:issue:`11514`)
Enhancements
~~~~~~~~~~~~

.. _whatsnew_0181.deferred_ops:

``.groupby(..)`` syntax with window and resample operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see (:issue:`12486`, :issue:`12738`).

You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. These return another deferred object (similar to what ``.rolling()`` and ``.expanding()`` do on ungrouped pandas objects). You can then operate on these ``RollingGroupby`` objects in a similar manner.

Previously you would have to do this to get a rolling window mean per-group:

.. ipython:: python

df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
'B': np.arange(40)})
df

.. ipython:: python

df.groupby('A').apply(lambda x: x.rolling(4).B.mean())

Now you can do:

.. ipython:: python

df.groupby('A').rolling(4).B.mean()

For ``.resample(..)`` type of operations, previously you would have to:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df.groupby('A').resample('4s', 'mean') already worked previously I think? And in 0.18.0 also df.groupby('A').resample('4s').mean() already worked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it worked partially, but only for downsampling. I'll give a different example which illustrates.


.. ipython:: python

df = pd.DataFrame({'date': pd.date_range(start='2016-01-01',
periods=4,
freq='W'),
'group': [1, 1, 2, 2],
'val': [5, 6, 7, 8]}).set_index('date')

df

.. ipython:: python

df.groupby('group').apply(lambda x: x.resample('1D').ffill())

Now you can do:

.. ipython:: python

df.groupby('group').resample('1D').ffill()

.. _whatsnew_0181.partial_string_indexing:

Partial string indexing on ``DateTimeIndex`` when part of a ``MultiIndex``
Expand Down Expand Up @@ -282,6 +333,9 @@ Bug Fixes
- Bug in ``.concat`` of datetime tz-aware and naive DataFrames (:issue:`12467`)
- Bug in correctly raising a ``ValueError`` in ``.resample(..).fillna(..)`` when passing a non-string (:issue:`12952`)

- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)


- Bug in ``Timestamp.__repr__`` that caused ``pprint`` to fail in nested structures (:issue:`12622`)
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by Pandas. See :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
- Bug in ``.quantile()`` with interpolation may coerce to ``float`` unexpectedly (:issue:`12772`)
Expand Down
60 changes: 60 additions & 0 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,19 @@ def _aggregate_multiple_funcs(self, arg, _level):

return concat(results, keys=keys, axis=1)

def _shallow_copy(self, obj=None, obj_type=None, **kwargs):
""" return a new object with the replacement attributes """
if obj is None:
obj = self._selected_obj.copy()
if obj_type is None:
obj_type = self._constructor
if isinstance(obj, obj_type):
obj = obj.obj
for attr in self._attributes:
if attr not in kwargs:
kwargs[attr] = getattr(self, attr)
return obj_type(obj, **kwargs)

def _is_cython_func(self, arg):
""" if we define an internal function for this argument, return it """
return self._cython_table.get(arg)
Expand All @@ -625,6 +638,53 @@ def _is_builtin_func(self, arg):
return self._builtin_table.get(arg, arg)


class GroupByMixin(object):
""" provide the groupby facilities to the mixed object """

@staticmethod
def _dispatch(name, *args, **kwargs):
""" dispatch to apply """
def outer(self, *args, **kwargs):
def f(x):
x = self._shallow_copy(x, groupby=self._groupby)
return getattr(x, name)(*args, **kwargs)
return self._groupby.apply(f)
outer.__name__ = name
return outer

def _gotitem(self, key, ndim, subset=None):
"""
sub-classes to define
return a sliced object

Parameters
----------
key : string / list of selections
ndim : 1,2
requested ndim of result
subset : object, default None
subset to act on
"""

# create a new object to prevent aliasing
if subset is None:
subset = self.obj

# we need to make a shallow copy of ourselves
# with the same groupby
kwargs = dict([(attr, getattr(self, attr))
for attr in self._attributes])
self = self.__class__(subset,
groupby=self._groupby[key],
parent=self,
**kwargs)
self._reset_cache()
if subset.ndim == 2:
if lib.isscalar(key) and key in subset or com.is_list_like(key):
self._selection = key
return self


class FrozenList(PandasObject, list):

"""
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3705,7 +3705,7 @@ def clip_lower(self, threshold, axis=None):
return self.where(subset, threshold, axis=axis)

def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
group_keys=True, squeeze=False):
group_keys=True, squeeze=False, **kwargs):
"""
Group series using mapper (dict or key function, apply given function
to group, return result as series) or by a series of columns.
Expand Down Expand Up @@ -3757,7 +3757,8 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
raise TypeError("You have to supply one of 'by' and 'level'")
axis = self._get_axis_number(axis)
return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
sort=sort, group_keys=group_keys, squeeze=squeeze)
sort=sort, group_keys=group_keys, squeeze=squeeze,
**kwargs)

def asfreq(self, freq, method=None, how=None, normalize=False):
"""
Expand Down
Loading