Skip to content

ENH: Add method argument to rolling constructor to allow table-wise rolling #38417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Dec 27, 2020
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
26b29dc
Add method keyword and validation
Dec 4, 2020
7739b57
Add sub numba table func
Dec 5, 2020
8d66fc4
Fix output result and result min_periods handling
Dec 10, 2020
db39381
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 10, 2020
b67e725
Fix cache key
Dec 10, 2020
ea89fb6
Just use transpose for axis = 1
Dec 11, 2020
87570ef
Add test comparing column and table methods
Dec 11, 2020
1477004
Test invalid series table rolling
Dec 11, 2020
c377186
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 11, 2020
9bd2d0d
Adjust some tests
Dec 11, 2020
9be922b
Add test for invalid method keyword
Dec 11, 2020
a6473a5
Notimplemented test with win_type
Dec 11, 2020
4fa7abd
wrap table method tests in a class
Dec 11, 2020
2d7a58e
Add method keyword for expanding
Dec 11, 2020
76ead37
Change caching key to handing callers from Rolling or Expanding
Dec 11, 2020
12e7c57
Test expanding table
Dec 11, 2020
93c546e
Change column to single
Dec 12, 2020
1eaad79
flake8
Dec 12, 2020
c550d8e
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 12, 2020
5d040d0
Add weighted mean example as a test
Dec 12, 2020
57fc525
black
Dec 12, 2020
84619a1
Add whatsnew and window.rst example
Dec 12, 2020
42af1c3
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 14, 2020
9f172bd
Add as of version to table
Dec 14, 2020
cfe909c
Fix typo and add wikipedia link in window.rst
Dec 14, 2020
884c7ff
Use correct type and document func output and input
Dec 14, 2020
08f23e0
Newline
Dec 14, 2020
27e9bbb
Plural -> singular
Dec 14, 2020
b4fa92c
Add noqa in rst
Dec 14, 2020
354c698
Skip if scipy isnt installed
Dec 14, 2020
6fdfef4
Add asv benchmark
Dec 14, 2020
72e50d3
Add min periods for min version build
Dec 14, 2020
bdd600c
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 14, 2020
d9aa45c
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 15, 2020
c769ea2
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 16, 2020
dde4d8e
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 16, 2020
b863dc2
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 16, 2020
c674b97
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 17, 2020
55f44af
Remove redundant 1.3 doc
Dec 17, 2020
24a80db
Remove closed and center parameterizations
Dec 17, 2020
0203d02
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 17, 2020
ba86276
Clarify doc
mroeschke Dec 22, 2020
1651616
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
mroeschke Dec 22, 2020
55156cd
Fix benchmark
mroeschke Dec 22, 2020
6ca999a
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
mroeschke Dec 22, 2020
388b837
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
mroeschke Dec 23, 2020
f4ba0c6
just use prange
mroeschke Dec 23, 2020
d37e23b
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
mroeschke Dec 23, 2020
9ad8064
Forgot decorator as part of merge conflict fixup
mroeschke Dec 23, 2020
d5eb31c
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
mroeschke Dec 23, 2020
754d71e
Merge remote-tracking branch 'upstream/master' into feature/rolling_t…
Dec 27, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions asv_bench/benchmarks/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,4 +252,21 @@ def time_groupby_mean(self, engine):
self.gb_ewm.mean(engine=engine)


def table_method_func(x):
return np.sum(x, axis=0) + 1


class TableMethod:

params = ["single", "table"]
param_names = ["method"]

def setup(self, method):
df = pd.DataFrame(np.random.randn(10, 1000))
self.roll = df.rolling(2, method=method)

def time_apply(self, method):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is method not used in this one? so we're re-doing the same benchmark twice? (not a deal-breaker asi think this happens in a number of places)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think I put the parameterization in the wrong place. Thanks

self.roll.apply(table_method_func, raw=True, engine="numba")


from .pandas_vb_common import setup # noqa: F401 isort:skip
39 changes: 31 additions & 8 deletions doc/source/user_guide/window.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ pandas supports 4 types of windowing operations:
#. Expanding window: Accumulating window over the values.
#. Exponentially Weighted window: Accumulating and exponentially weighted window over the values.

============================= ================= =========================== =========================== ========================
Concept Method Returned Object Supports time-based windows Supports chained groupby
============================= ================= =========================== =========================== ========================
Rolling window ``rolling`` ``Rolling`` Yes Yes
Weighted window ``rolling`` ``Window`` No No
Expanding window ``expanding`` ``Expanding`` No Yes
Exponentially Weighted window ``ewm`` ``ExponentialMovingWindow`` No Yes (as of version 1.2)
============================= ================= =========================== =========================== ========================
============================= ================= =========================== =========================== ======================== ===================================
Concept Method Returned Object Supports time-based windows Supports chained groupby Supports table method
============================= ================= =========================== =========================== ======================== ===================================
Rolling window ``rolling`` ``Rolling`` Yes Yes Yes (as of version 1.3)
Weighted window ``rolling`` ``Window`` No No No
Expanding window ``expanding`` ``Expanding`` No Yes Yes (as of version 1.3)
Exponentially Weighted window ``ewm`` ``ExponentialMovingWindow`` No Yes (as of version 1.2) No
============================= ================= =========================== =========================== ======================== ===================================

As noted above, some operations support specifying a window based on a time offset:

Expand Down Expand Up @@ -76,6 +76,29 @@ which will first group the data by the specified keys and then perform a windowi
to compute the rolling sums to preserve accuracy as much as possible.


.. versionadded:: 1.3

Some windowing operations also support the ``method='table'`` option in the constructor which
performs the windowing operaion over an entire :class:`DataFrame` instead of a single column or row at a time.
This can provide a useful performance benefit for a :class:`DataFrame` with many columns or rows
(with the corresponding ``axis`` argument) or the ability to utilize other columns during the windowing
operation. The ``method='table'`` option can only be used if ``engine='numba'`` is specified
in the corresponding method call.

For example, a `weighted mean <https://en.wikipedia.org/wiki/Weighted_arithmetic_mean>`__ calculation can
be calculated with :meth:`~Rolling.apply` by specifying a separate column of weights.

.. ipython:: python

def weighted_mean(x):
arr = np.ones((1, x.shape[1]))
arr[:, :2] = (x[:, :2] * x[:, 2]).sum(axis=0) / x[:, 2].sum()
return arr

df = pd.DataFrame([[1, 2, 0.6], [2, 3, 0.4], [3, 4, 0.2], [4, 5, 0.7]])
df.rolling(2, method="table", min_periods=0).apply(weighted_mean, raw=True, engine="numba") # noqa:E501


All windowing operations support a ``min_periods`` argument that dictates the minimum amount of
non-``np.nan`` values a window must have; otherwise, the resulting value is ``np.nan``.
``min_peridos`` defaults to 1 for time-based windows and ``window`` for fixed windows
Expand Down
5 changes: 5 additions & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ For example:
storage_options=headers
)

.. _whatsnew_130.window_method_table:

:class:`Rolling` and :class:`Expanding` now support a ``method`` argument with a
``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`.
See ref:`window.overview` for performance and functional benefits. (:issue:`15095`)

.. _whatsnew_130.enhancements.other:

Expand Down
13 changes: 11 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -11214,6 +11214,7 @@ def rolling(
on: Optional[str] = None,
axis: Axis = 0,
closed: Optional[str] = None,
method: str = "single",
):
axis = self._get_axis_number(axis)

Expand All @@ -11227,6 +11228,7 @@ def rolling(
on=on,
axis=axis,
closed=closed,
method=method,
)

return Rolling(
Expand All @@ -11238,12 +11240,17 @@ def rolling(
on=on,
axis=axis,
closed=closed,
method=method,
)

@final
@doc(Expanding)
def expanding(
self, min_periods: int = 1, center: Optional[bool_t] = None, axis: Axis = 0
self,
min_periods: int = 1,
center: Optional[bool_t] = None,
axis: Axis = 0,
method: str = "single",
) -> Expanding:
axis = self._get_axis_number(axis)
if center is not None:
Expand All @@ -11255,7 +11262,9 @@ def expanding(
else:
center = False

return Expanding(self, min_periods=min_periods, center=center, axis=axis)
return Expanding(
self, min_periods=min_periods, center=center, axis=axis, method=method
)

@final
@doc(ExponentialMovingWindow)
Expand Down
1 change: 1 addition & 0 deletions pandas/core/window/ewm.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@ def __init__(
self.on = None
self.center = False
self.closed = None
self.method = "single"
if times is not None:
if isinstance(times, str):
times = self._selected_obj[times]
Expand Down
18 changes: 15 additions & 3 deletions pandas/core/window/expanding.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ class Expanding(RollingAndExpandingMixin):
center : bool, default False
Set the labels at the center of the window.
axis : int or str, default 0
method : str, default 'single'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{'single', 'table'}, default 'single'?

Execute the expanding operation per single column or row (`'single'`)
or over the entire object (`'table'`).

This argument is only implemented when specifying ``engine='numba'``
in the method call.

.. versionadded:: 1.3.0

Returns
-------
Expand Down Expand Up @@ -59,10 +67,14 @@ class Expanding(RollingAndExpandingMixin):
4 7.0
"""

_attributes = ["min_periods", "center", "axis"]
_attributes = ["min_periods", "center", "axis", "method"]

def __init__(self, obj, min_periods=1, center=None, axis=0, **kwargs):
super().__init__(obj=obj, min_periods=min_periods, center=center, axis=axis)
def __init__(
self, obj, min_periods=1, center=None, axis=0, method="single", **kwargs
):
super().__init__(
obj=obj, min_periods=min_periods, center=center, axis=axis, method=method
)

@property
def _constructor(self):
Expand Down
73 changes: 72 additions & 1 deletion pandas/core/window/numba_.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ def generate_numba_apply_func(
kwargs: Dict[str, Any],
func: Callable[..., Scalar],
engine_kwargs: Optional[Dict[str, bool]],
name: str,
):
"""
Generate a numba jitted apply function specified by values from engine_kwargs.
Expand All @@ -37,14 +38,16 @@ def generate_numba_apply_func(
function to be applied to each window and will be JITed
engine_kwargs : dict
dictionary of arguments to be passed into numba.jit
name: str
name of the caller (Rolling/Expanding)

Returns
-------
Numba function
"""
nopython, nogil, parallel = get_jit_arguments(engine_kwargs, kwargs)

cache_key = (func, "rolling_apply")
cache_key = (func, f"{name}_apply_single")
if cache_key in NUMBA_FUNC_CACHE:
return NUMBA_FUNC_CACHE[cache_key]

Expand Down Expand Up @@ -161,3 +164,71 @@ def groupby_ewma(
return result

return groupby_ewma


def generate_numba_table_func(
args: Tuple,
kwargs: Dict[str, Any],
func: Callable[..., np.ndarray],
engine_kwargs: Optional[Dict[str, bool]],
name: str,
):
"""
Generate a numba jitted function to apply window calculations table-wise.

Func will be passed a M window size x N number of columns array, and
must return a 1 x N number of columns array. Func is intended to operate
row-wise, but the result will be transposed for axis=1.

1. jit the user's function
2. Return a rolling apply function with the jitted function inline

Parameters
----------
args : tuple
*args to be passed into the function
kwargs : dict
**kwargs to be passed into the function
func : function
function to be applied to each window and will be JITed
engine_kwargs : dict
dictionary of arguments to be passed into numba.jit
name : str
caller (Rolling/Expanding) and original method name for numba cache key

Returns
-------
Numba function
"""
nopython, nogil, parallel = get_jit_arguments(engine_kwargs, kwargs)

cache_key = (func, f"{name}_table")
if cache_key in NUMBA_FUNC_CACHE:
return NUMBA_FUNC_CACHE[cache_key]

numba_func = jit_user_function(func, nopython, nogil, parallel)
numba = import_optional_dependency("numba")
if parallel:
loop_range = numba.prange
else:
loop_range = range

@numba.jit(nopython=nopython, nogil=nogil, parallel=parallel)
def roll_table(
values: np.ndarray, begin: np.ndarray, end: np.ndarray, minimum_periods: int
):
result = np.empty(values.shape)
min_periods_mask = np.empty(values.shape)
for i in loop_range(len(result)):
start = begin[i]
stop = end[i]
window = values[start:stop]
count_nan = np.sum(np.isnan(window), axis=0)
sub_result = numba_func(window, *args)
nan_mask = len(window) - count_nan >= minimum_periods
min_periods_mask[i, :] = nan_mask
result[i, :] = sub_result
result = np.where(min_periods_mask, result, np.nan)
return result

return roll_table
Loading