Skip to content

ENH: Allow users to definite their own window bound calculations in rolling #29878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Dec 5, 2019
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
1c8c24a
Add BaseIndexer class
Nov 25, 2019
1188edd
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Nov 26, 2019
6b5e894
Reformat Indexers
Nov 26, 2019
3a310f6
Remove init
Nov 26, 2019
d1d0775
Add BaseIndexer to api and allow rolling to accept BaseIndexer subcla…
Nov 26, 2019
218395e
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Nov 27, 2019
46d4a52
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Nov 27, 2019
c10854d
Lint cython files
Nov 27, 2019
c237090
Move indexers to pandas/core/window/indexers
Nov 27, 2019
1ddc828
Share get_window_bounds docstring
Nov 27, 2019
a861982
isort
Nov 27, 2019
8f482f7
Validate signature of get_window_bounds
Nov 27, 2019
38691c7
Validate signature of get_window_bounds
Nov 27, 2019
9d740d3
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Nov 28, 2019
d18e954
Lint
Nov 28, 2019
f06e8e6
Comment on unused variables in calculate_variable_window_bounds
Nov 28, 2019
7ccbcd0
Type annotate _get_window_indexer & black
Nov 28, 2019
4e2fd30
self.index -> self.index_array
Nov 28, 2019
c3153d8
Add test for ExpandingIndexer
Nov 28, 2019
89100c4
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Nov 29, 2019
2704c59
Add doc example in computation.rst with test + handle start, end boun…
Dec 1, 2019
87768ea
Add back win_type (for now)
Dec 1, 2019
6a6d896
Add 1.0.0 whatsnew note
Dec 1, 2019
2864e95
Remove BaseIndexers accepting win_type (weighted rolling)
Dec 1, 2019
b16e711
Lint
Dec 1, 2019
ed08ca3
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Dec 1, 2019
9eb3022
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Dec 2, 2019
f358466
Try changing import
Dec 2, 2019
0d8cc1f
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Dec 3, 2019
25a05fe
Make doc example a code block, add docstring
Dec 3, 2019
5d8819f
Change self.__dict__(kwargs) to more explicit setattr
Dec 3, 2019
9194557
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Dec 3, 2019
e7e1061
Fix docstring
Dec 3, 2019
09afec4
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Dec 3, 2019
10c4994
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Dec 4, 2019
9089f7b
Add BaseIndexer in doc/source/reference/window/rst
Dec 4, 2019
87e391f
Add typing for _validate_get_window_bounds_signature
Dec 4, 2019
7ce1967
Merge remote-tracking branch 'upstream/master' into feature/baseindexer
Dec 4, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions doc/source/user_guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,55 @@ default of the index) in a DataFrame.
dft
dft.rolling('2s', on='foo').sum()

.. _stats.custom_rolling_window:

Custom window rolling
~~~~~~~~~~~~~~~~~~~~~

.. versionadded:: 1.0

In addition to accepting an integer or offset as a ``window`` argument, ``rolling`` also accepts
a ``BaseIndexer`` subclass that allows a user to define a custom method for calculating window bounds.
The ``BaseIndexer`` subclass will need to define a ``get_window_bounds`` method that returns
a tuple of two arrays, the first being the starting indices of the windows and second being the
ending indices of the windows. Additionally, ``num_values``, ``min_periods``, ``center``, ``closed``
and will automatically be passed to ``get_window_bounds`` and the defined method must
always accept these arguments.

For example, if we have the following ``DataFrame``:

.. ipython:: python

use_expanding = [True, False, True, False, True]
use_expanding
df = pd.DataFrame({'values': range(5)})
df

and we want to use an expanding window where ``use_expanding`` is ``True`` otherwise a window of size
1, we can create the following ``BaseIndexer``:

.. ipython:: python

from pandas.api.indexers import BaseIndexer

class CustomIndexer(BaseIndexer):

def get_window_bounds(self, num_values, min_periods, center, closed):
start = np.empty(num_values, dtype=np.int64)
end = np.empty(num_values, dtype=np.int64)
for i in range(num_values):
if self.use_expanding[i]:
start[i] = 0
end[i] = i + 1
else:
start[i] = i
end[i] = i + self.window_size
return start, end

indexer = CustomIndexer(window_size=1, use_expanding=use_expanding)
df.rolling(indexer).sum()


.. _stats.rolling_window.endpoints:

Rolling window endpoints
Expand Down
10 changes: 10 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,16 @@ You can use the alias ``"boolean"`` as well.
s = pd.Series([True, False, None], dtype="boolean")
s

.. _whatsnew_1000.custom_window:

Defining custom windows for rolling operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We've added a :func:`pandas.api.indexers.BaseIndexer` class that allows users to define how
window bounds are created during ``rolling`` operations. Users can define their own ``get_window_bounds``
method on a :func:`pandas.api.indexers.BaseIndexer` subclass that will generate the start and end
indices used for each window during the rolling aggregation. For more details and example usage, see
the :ref:`custom window rolling documentation <stats.custom_rolling_window>`

.. _whatsnew_1000.enhancements.other:

Expand Down
59 changes: 43 additions & 16 deletions pandas/_libs/window/aggregations.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,8 @@ cdef inline void remove_sum(float64_t val, int64_t *nobs, float64_t *sum_x) nogi


def roll_sum_variable(ndarray[float64_t] values, ndarray[int64_t] start,
ndarray[int64_t] end, int64_t minp):
ndarray[int64_t] end, int64_t minp,
bint is_monotonic_bounds=True):
cdef:
float64_t sum_x = 0
int64_t s, e
Expand All @@ -198,11 +199,10 @@ def roll_sum_variable(ndarray[float64_t] values, ndarray[int64_t] start,
s = start[i]
e = end[i]

if i == 0:
if i == 0 or not is_monotonic_bounds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we do a simple calculcate in start/end to determine if is_monotonic is true here? it likely is quite performant.


# setup
sum_x = 0.0
nobs = 0

for j in range(s, e):
add_sum(values[j], &nobs, &sum_x)

Expand All @@ -218,6 +218,10 @@ def roll_sum_variable(ndarray[float64_t] values, ndarray[int64_t] start,

output[i] = calc_sum(minp, nobs, sum_x)

if not is_monotonic_bounds:
for j in range(s, e):
remove_sum(values[j], &nobs, &sum_x)

return output


Expand Down Expand Up @@ -327,7 +331,8 @@ def roll_mean_fixed(ndarray[float64_t] values, ndarray[int64_t] start,


def roll_mean_variable(ndarray[float64_t] values, ndarray[int64_t] start,
ndarray[int64_t] end, int64_t minp):
ndarray[int64_t] end, int64_t minp,
bint is_monotonic_bounds=True):
cdef:
float64_t val, sum_x = 0
int64_t s, e
Expand All @@ -342,11 +347,9 @@ def roll_mean_variable(ndarray[float64_t] values, ndarray[int64_t] start,
s = start[i]
e = end[i]

if i == 0:
if i == 0 or not is_monotonic_bounds:

# setup
sum_x = 0.0
nobs = 0
for j in range(s, e):
val = values[j]
add_mean(val, &nobs, &sum_x, &neg_ct)
Expand All @@ -365,6 +368,10 @@ def roll_mean_variable(ndarray[float64_t] values, ndarray[int64_t] start,

output[i] = calc_mean(minp, nobs, neg_ct, sum_x)

if not is_monotonic_bounds:
for j in range(s, e):
val = values[j]
remove_mean(val, &nobs, &sum_x, &neg_ct)
return output

# ----------------------------------------------------------------------
Expand Down Expand Up @@ -486,7 +493,8 @@ def roll_var_fixed(ndarray[float64_t] values, ndarray[int64_t] start,


def roll_var_variable(ndarray[float64_t] values, ndarray[int64_t] start,
ndarray[int64_t] end, int64_t minp, int ddof=1):
ndarray[int64_t] end, int64_t minp, int ddof=1,
bint is_monotonic_bounds=True):
"""
Numerically stable implementation using Welford's method.
"""
Expand All @@ -508,7 +516,7 @@ def roll_var_variable(ndarray[float64_t] values, ndarray[int64_t] start,

# Over the first window, observations can only be added
# never removed
if i == 0:
if i == 0 or not is_monotonic_bounds:

for j in range(s, e):
add_var(values[j], &nobs, &mean_x, &ssqdm_x)
Expand All @@ -528,6 +536,10 @@ def roll_var_variable(ndarray[float64_t] values, ndarray[int64_t] start,

output[i] = calc_var(minp, ddof, nobs, ssqdm_x)

if not is_monotonic_bounds:
for j in range(s, e):
remove_var(values[j], &nobs, &mean_x, &ssqdm_x)

return output

# ----------------------------------------------------------------------
Expand Down Expand Up @@ -629,7 +641,8 @@ def roll_skew_fixed(ndarray[float64_t] values, ndarray[int64_t] start,


def roll_skew_variable(ndarray[float64_t] values, ndarray[int64_t] start,
ndarray[int64_t] end, int64_t minp):
ndarray[int64_t] end, int64_t minp,
bint is_monotonic_bounds=True):
cdef:
float64_t val, prev
float64_t x = 0, xx = 0, xxx = 0
Expand All @@ -648,7 +661,7 @@ def roll_skew_variable(ndarray[float64_t] values, ndarray[int64_t] start,

# Over the first window, observations can only be added
# never removed
if i == 0:
if i == 0 or not is_monotonic_bounds:

for j in range(s, e):
val = values[j]
Expand All @@ -671,6 +684,11 @@ def roll_skew_variable(ndarray[float64_t] values, ndarray[int64_t] start,

output[i] = calc_skew(minp, nobs, x, xx, xxx)

if not is_monotonic_bounds:
for j in range(s, e):
val = values[j]
remove_skew(val, &nobs, &x, &xx, &xxx)

return output

# ----------------------------------------------------------------------
Expand Down Expand Up @@ -776,7 +794,8 @@ def roll_kurt_fixed(ndarray[float64_t] values, ndarray[int64_t] start,


def roll_kurt_variable(ndarray[float64_t] values, ndarray[int64_t] start,
ndarray[int64_t] end, int64_t minp):
ndarray[int64_t] end, int64_t minp,
bint is_monotonic_bounds=True):
cdef:
float64_t val, prev
float64_t x = 0, xx = 0, xxx = 0, xxxx = 0
Expand All @@ -794,7 +813,7 @@ def roll_kurt_variable(ndarray[float64_t] values, ndarray[int64_t] start,

# Over the first window, observations can only be added
# never removed
if i == 0:
if i == 0 or not is_monotonic_bounds:

for j in range(s, e):
add_kurt(values[j], &nobs, &x, &xx, &xxx, &xxxx)
Expand All @@ -814,6 +833,10 @@ def roll_kurt_variable(ndarray[float64_t] values, ndarray[int64_t] start,

output[i] = calc_kurt(minp, nobs, x, xx, xxx, xxxx)

if not is_monotonic_bounds:
for j in range(s, e):
remove_kurt(values[j], &nobs, &x, &xx, &xxx, &xxxx)

return output


Expand Down Expand Up @@ -1007,7 +1030,8 @@ def roll_min_fixed(ndarray[float64_t] values, ndarray[int64_t] start,


def roll_min_variable(ndarray[float64_t] values, ndarray[int64_t] start,
ndarray[int64_t] end, int64_t minp):
ndarray[int64_t] end, int64_t minp,
bint is_monotonic_bounds=True):
"""
Moving max of 1d array of any numeric type along axis=0 ignoring NaNs.
Expand Down Expand Up @@ -1400,7 +1424,10 @@ def roll_generic_variable(object obj,
ndarray[int64_t] start, ndarray[int64_t] end,
int64_t minp,
int offset, object func, bint raw,
object args, object kwargs):
object args, object kwargs,
bint is_monotonic_bounds=True):
# is_monotonic_bounds unused since variable algorithm doesn't calculate
# adds/subtracts across windows, but matches other *_variable functions
cdef:
ndarray[float64_t] output, counts, bufarr
ndarray[float64_t, cast=True] arr
Expand Down
Loading