Skip to content

ENH: provide standard BaseIndexers in pandas.api.indexers #33236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Apr 8, 2020
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
31fd213
ENH: initial indexer implementation
AlexKirko Apr 2, 2020
d2b7743
FIX: fix brackets
AlexKirko Apr 2, 2020
5afb066
ENH: expose FixedForwardWIndowIndexer through API
AlexKirko Apr 2, 2020
212448f
DOC: add whatsnew entry
AlexKirko Apr 2, 2020
0aa7980
CLN: run black
AlexKirko Apr 2, 2020
fff110c
DOC: add example to Rolling docstring
AlexKirko Apr 2, 2020
c94cbd5
TST: use new class in test_rolling_forward_window
AlexKirko Apr 4, 2020
4e5aa02
TST: add benchmark
AlexKirko Apr 4, 2020
271cc36
fix benchmark
AlexKirko Apr 4, 2020
ac8fd5f
Merge branch 'master' into forward-indexer
AlexKirko Apr 4, 2020
dc7fec8
DOC: document the indexer in the user guide
AlexKirko Apr 4, 2020
e1a6f73
DOC: edit the text in the user_guide
AlexKirko Apr 4, 2020
dadcb3d
fix the benchmark
AlexKirko Apr 4, 2020
46f45c2
DOC: add versionadded to user_guide
AlexKirko Apr 5, 2020
7464c92
raise if min_periods, center, or closed is not None
AlexKirko Apr 6, 2020
6cdd20d
DOC: expand the class docstring
AlexKirko Apr 6, 2020
fb43381
DOC: add examples to the class docstring
AlexKirko Apr 6, 2020
9702839
DOC: add ref from whatsnew to the user guide
AlexKirko Apr 6, 2020
728ada2
rollback raising behavior
AlexKirko Apr 6, 2020
ffd50c5
add errors for center and closed
AlexKirko Apr 6, 2020
3ed0437
CLN: run black
AlexKirko Apr 6, 2020
1e3efdd
CLN: remove unnecessary quotes
AlexKirko Apr 6, 2020
916bd6a
Merge branch 'master' into forward-indexer
AlexKirko Apr 6, 2020
9bc5fc3
DOC: replace code-block with ipython block
AlexKirko Apr 7, 2020
bd265f4
DOC: expose docstring, make links clickable
AlexKirko Apr 7, 2020
d08081b
TST: add error raising tests
AlexKirko Apr 7, 2020
6e7c629
Merge branch 'master' into forward-indexer
AlexKirko Apr 7, 2020
81c4216
CLN: clean up formatting in docs
AlexKirko Apr 7, 2020
1451340
CLN: remove unnecessary BaseIndexer import
AlexKirko Apr 7, 2020
768197a
CLN: remove unnecessary new line in computation.rst
AlexKirko Apr 7, 2020
bc59bb7
DOC: move new indexer in window reference, delete extra
AlexKirko Apr 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions asv_bench/benchmarks/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,26 @@ def peakmem_fixed(self):
self.roll.max()


class ForwardWindowMethods:
params = (
["DataFrame", "Series"],
[10, 1000],
["int", "float"],
["median", "mean", "max", "min", "kurt", "sum"],
)
param_names = ["constructor", "window_size", "dtype", "method"]

def setup(self, constructor, window_size, dtype, method):
N = 10 ** 5
arr = np.random.random(N).astype(dtype)
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=window_size)
self.roll = getattr(pd, constructor)(arr).rolling(window=indexer)

def time_rolling(self, constructor, window_size, dtype, method):
getattr(self.roll, method)()

def peakmem_rolling(self, constructor, window_size, dtype, method):
getattr(self.roll, method)()


from .pandas_vb_common import setup # noqa: F401 isort:skip
23 changes: 23 additions & 0 deletions doc/source/user_guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -571,6 +571,29 @@ and we want to use an expanding window where ``use_expanding`` is ``True`` other
3 3.0
4 10.0

.. versionadded:: 1.1

For some problems knowledge of the future is available for analysis. For example, this occurs when
each data point is a full time series read from an experiment, and the task is to extract underlying
conditions. In these cases it can be useful to perform forward-looking rolling window computations.
``FixedForwardWindowIndexer`` class is available for this purpose. This ``BaseIndexer`` subclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the ref here, e.g. the one you are using in the whatsnew

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I've included a ref in whatsnew, pointing to this subsection of the user guide. If you meant something different, please tell me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I mean a ref on FixedForwardIndexer here that points to the api (makes a clickable link), and one for BaseIndexer

implements a closed fixed-width forward-looking rolling window, and we can use it as follows:

.. code-block:: ipython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use an ipython block here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
Sorry, could you clarify? To me, my ipython block looks right, both in raw and in the docs I compiled locally. Do I need to do something differently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use code-blocks on executable code. pls use an ipython block so that the doc-build executes this, that way it will never get out of date.


In [5]: from pandas.api.indexers import FixedForwardWindowIndexer

In [6]: indexer = FixedForwardWindowIndexer(window_size=2)

In [7]: df.rolling(indexer, min_periods=1).sum()
Out[8]:
values
0 1.0
1 3.0
2 5.0
3 7.0
4 4.0


.. _stats.rolling_window.endpoints:

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ Other API changes
- ``loc`` lookups with an object-dtype :class:`Index` and an integer key will now raise ``KeyError`` instead of ``TypeError`` when key is missing (:issue:`31905`)
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``std``, ``var``, ``count``, ``skew``, ``cov``, ``corr`` will now raise a ``NotImplementedError`` (:issue:`32865`)
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``min``, ``max`` will now return correct results for any monotonic :func:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
- Added a :func:`pandas.api.indexers.FixedForwardWindowIndexer` class to support forward-looking windows during ``rolling`` operations.
-

Backwards incompatible API changes
Expand Down
4 changes: 2 additions & 2 deletions pandas/api/indexers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
"""

from pandas.core.indexers import check_array_indexer
from pandas.core.window.indexers import BaseIndexer
from pandas.core.window.indexers import BaseIndexer, FixedForwardWindowIndexer

__all__ = ["check_array_indexer", "BaseIndexer"]
__all__ = ["check_array_indexer", "BaseIndexer", "FixedForwardWindowIndexer"]
20 changes: 20 additions & 0 deletions pandas/core/window/indexers.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,3 +120,23 @@ def get_window_bounds(
np.zeros(num_values, dtype=np.int64),
np.arange(1, num_values + 1, dtype=np.int64),
)


class FixedForwardWindowIndexer(BaseIndexer):
"""Calculate fixed forward-looking rolling window bounds,"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add Examples section here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@Appender(get_window_bounds_doc)
def get_window_bounds(
self,
num_values: int = 0,
min_periods: Optional[int] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should raise if min_periods, center, closed are not None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented it a bit differently, pinged you in the comment on the implementation.

center: Optional[bool] = None,
closed: Optional[str] = None,
) -> Tuple[np.ndarray, np.ndarray]:

start = np.arange(num_values, dtype="int64")
end_s = start[: -self.window_size] + self.window_size
end_e = np.full(self.window_size, num_values, dtype="int64")
end = np.concatenate([end_s, end_e])

return start, end
11 changes: 11 additions & 0 deletions pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -900,6 +900,17 @@ class Window(_Window):
3 2.0
4 4.0

Same as above, but with forward-looking windows

>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
>>> df.rolling(window=indexer, min_periods=1).sum()
B
0 1.0
1 3.0
2 2.0
3 4.0
4 4.0

A ragged (meaning not-a-regular frequency), time-indexed DataFrame

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
Expand Down
13 changes: 2 additions & 11 deletions pandas/tests/window/test_base_indexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from pandas import DataFrame, Series
import pandas._testing as tm
from pandas.api.indexers import BaseIndexer
from pandas.api.indexers import BaseIndexer, FixedForwardWindowIndexer
from pandas.core.window.indexers import ExpandingIndexer


Expand Down Expand Up @@ -105,19 +105,10 @@ def get_window_bounds(self, num_values, min_periods, center, closed):
)
def test_rolling_forward_window(constructor, func, alt_func, expected):
# GH 32865
class ForwardIndexer(BaseIndexer):
def get_window_bounds(self, num_values, min_periods, center, closed):
start = np.arange(num_values, dtype="int64")
end_s = start[: -self.window_size] + self.window_size
end_e = np.full(self.window_size, num_values, dtype="int64")
end = np.concatenate([end_s, end_e])

return start, end

values = np.arange(10)
values[5] = 100.0

indexer = ForwardIndexer(window_size=3)
indexer = FixedForwardWindowIndexer(window_size=3)
rolling = constructor(values).rolling(window=indexer, min_periods=2)
result = getattr(rolling, func)()
expected = constructor(expected)
Expand Down