Skip to content

Rolling window endpoints inclusion #15795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

42 changes: 42 additions & 0 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -459,6 +459,48 @@ default of the index) in a DataFrame.
dft
dft.rolling('2s', on='foo').sum()

.. _stats.rolling_window.endpoints:

Rolling Window Endpoints
~~~~~~~~~~~~~~~~~~~~~~~~

.. versionadded:: 0.20.0

The inclusion of the interval endpoints in rolling window calculations can be specified with the ``closed``
parameter:

.. csv-table::
:header: "``closed``", "Description", "Default for"
:widths: 20, 30, 30

``right``, close right endpoint, time-based windows
``left``, close left endpoint,
``both``, close both endpoints, fixed windows
``neither``, open endpoints,

For example, having the right endpoint open is useful in many problems that require that there is no contamination
from present information back to past information. This allows the rolling window to compute statistics
"up to that point in time", but not including that point in time.

.. ipython:: python

df = pd.DataFrame({'x': [1]*5},
index = [pd.Timestamp('20130101 09:00:01'),
pd.Timestamp('20130101 09:00:02'),
pd.Timestamp('20130101 09:00:03'),
pd.Timestamp('20130101 09:00:04'),
pd.Timestamp('20130101 09:00:06')])

df["right"] = df.rolling('2s', closed='right').x.sum() # default
df["both"] = df.rolling('2s', closed='both').x.sum()
df["left"] = df.rolling('2s', closed='left').x.sum()
df["neither"] = df.rolling('2s', closed='neither').x.sum()

df

Currently, this feature is only implemented for time-based windows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand this a touch to say that only closed='both' is accepted for fixed windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only accepting closed=None for fixed windows, as previously requested by you

For the fixed window document as well (you can say its closed='both', but ATM only allow None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what i mean just make sure the docs are clear on that ; e.g. None -> 'both' for fixed (and assert that as well )

For fixed windows, the closed parameter cannot be set and the rolling window will always have both endpoints closed.

.. _stats.moments.ts-versus-resampling:

Time-aware Rolling vs. Resampling
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,7 @@ To convert a ``SparseDataFrame`` back to sparse SciPy matrix in COO format, you
Other Enhancements
^^^^^^^^^^^^^^^^^^

- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window endpoint closedness. See the :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`)
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5962,12 +5962,12 @@ def _add_series_or_dataframe_operations(cls):

@Appender(rwindow.rolling.__doc__)
def rolling(self, window, min_periods=None, freq=None, center=False,
win_type=None, on=None, axis=0):
win_type=None, on=None, axis=0, closed=None):
axis = self._get_axis_number(axis)
return rwindow.rolling(self, window=window,
min_periods=min_periods, freq=freq,
center=center, win_type=win_type,
on=on, axis=axis)
on=on, axis=axis, closed=closed)

cls.rolling = rolling

Expand Down
46 changes: 32 additions & 14 deletions pandas/core/window.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,12 @@

class _Window(PandasObject, SelectionMixin):
_attributes = ['window', 'min_periods', 'freq', 'center', 'win_type',
'axis', 'on']
'axis', 'on', 'closed']
exclusions = set()

def __init__(self, obj, window=None, min_periods=None, freq=None,
center=False, win_type=None, axis=0, on=None, **kwargs):
center=False, win_type=None, axis=0, on=None, closed=None,
**kwargs):

if freq is not None:
warnings.warn("The freq kw is deprecated and will be removed in a "
Expand All @@ -71,6 +72,7 @@ def __init__(self, obj, window=None, min_periods=None, freq=None,
self.blocks = []
self.obj = obj
self.on = on
self.closed = closed
self.window = window
self.min_periods = min_periods
self.freq = freq
Expand Down Expand Up @@ -101,6 +103,10 @@ def validate(self):
if self.min_periods is not None and not \
is_integer(self.min_periods):
raise ValueError("min_periods must be an integer")
if self.closed is not None and self.closed not in \
['right', 'both', 'left', 'neither']:
raise ValueError("closed must be 'right', 'left', 'both' or "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there validation for a fixed window when closed is NOT both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, there is a validation for when closed is not None in Rolling.validate()

"'neither'")

def _convert_freq(self, how=None):
""" resample according to the how, return a new object """
Expand Down Expand Up @@ -374,8 +380,14 @@ class Window(_Window):
on : string, optional
For a DataFrame, column on which to calculate
the rolling window, rather than the index
closed : string, default None
Make the interval closed on the 'right', 'left', 'both' or
'neither' endpoints.
For offset-based windows, it defaults to 'right'.
For fixed windows, defaults to 'both'. Remaining cases not implemented
for fixed windows.

.. versionadded:: 0.19.0
.. versionadded:: 0.20.0

axis : int or string, default 0

Expand Down Expand Up @@ -717,12 +729,12 @@ def _apply(self, func, name=None, window=None, center=None,
raise ValueError("we do not support this function "
"in _window.{0}".format(func))

def func(arg, window, min_periods=None):
def func(arg, window, min_periods=None, closed=None):
minp = check_minp(min_periods, window)
# ensure we are only rolling on floats
arg = _ensure_float64(arg)
return cfunc(arg,
window, minp, indexi, **kwargs)
window, minp, indexi, closed, **kwargs)

# calculation function
if center:
Expand All @@ -731,11 +743,13 @@ def func(arg, window, min_periods=None):

def calc(x):
return func(np.concatenate((x, additional_nans)),
window, min_periods=self.min_periods)
window, min_periods=self.min_periods,
closed=self.closed)
else:

def calc(x):
return func(x, window, min_periods=self.min_periods)
return func(x, window, min_periods=self.min_periods,
closed=self.closed)

with np.errstate(all='ignore'):
if values.ndim > 1:
Expand Down Expand Up @@ -768,7 +782,8 @@ def count(self):
for b in blocks:
result = b.notnull().astype(int)
result = self._constructor(result, window=window, min_periods=0,
center=self.center).sum()
center=self.center,
closed=self.closed).sum()
results.append(result)

return self._wrap_results(results, blocks, obj)
Expand All @@ -789,11 +804,10 @@ def apply(self, func, args=(), kwargs={}):
offset = _offset(window, self.center)
index, indexi = self._get_index()

def f(arg, window, min_periods):
def f(arg, window, min_periods, closed):
minp = _use_window(min_periods, window)
return _window.roll_generic(arg, window, minp, indexi,
offset, func, args,
kwargs)
return _window.roll_generic(arg, window, minp, indexi, closed,
offset, func, args, kwargs)

return self._apply(f, func, args=args, kwargs=kwargs,
center=False)
Expand Down Expand Up @@ -864,7 +878,7 @@ def std(self, ddof=1, *args, **kwargs):
def f(arg, *args, **kwargs):
minp = _require_min_periods(1)(self.min_periods, window)
return _zsqrt(_window.roll_var(arg, window, minp, indexi,
ddof))
self.closed, ddof))

return self._apply(f, 'std', check_minp=_require_min_periods(1),
ddof=ddof, **kwargs)
Expand Down Expand Up @@ -911,7 +925,7 @@ def quantile(self, quantile, **kwargs):
def f(arg, *args, **kwargs):
minp = _use_window(self.min_periods, window)
return _window.roll_quantile(arg, window, minp, indexi,
quantile)
self.closed, quantile)

return self._apply(f, 'quantile', quantile=quantile,
**kwargs)
Expand Down Expand Up @@ -1044,6 +1058,10 @@ def validate(self):
elif self.window < 0:
raise ValueError("window must be non-negative")

if not self.is_datetimelike and self.closed is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback here we enforce that closed must be None for windows that are not datetimelike

raise ValueError("closed only implemented for datetimelike "
"and offset based windows")

def _validate_monotonic(self):
""" validate on is monotonic """
if not self._on.is_monotonic:
Expand Down
Loading