Skip to content

ENH: Rolling window with step size (GH-15354) #45765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 28, 2022

Conversation

rtpsw
Copy link
Contributor

@rtpsw rtpsw commented Feb 1, 2022

@rtpsw
Copy link
Contributor Author

rtpsw commented Feb 1, 2022

@mroeschke, @jreback: This PR replaces #45735. The current changeset I prepared passes the original unchanged window tests, indicating that the new behavior does not break existing behaviors. After this changeset is worked out in review, the next changeset will add new tests.

@jreback jreback added the Window rolling, ewma, expanding label Feb 3, 2022
@jreback jreback added this to the 1.5 milestone Feb 11, 2022
@jreback
Copy link
Contributor

jreback commented Feb 11, 2022

@rtpsw can you merge master and i think a few open comments.

@rtpsw
Copy link
Contributor Author

rtpsw commented Feb 11, 2022

I reverted to the original signature of get_window_bounds, so now group-by, emw and expand do not support a step argument with non-default behavior. I minimally modified the existing tests to allow them to pass - basically just adjusting the signature of get_window_bounds in custom indexers derived from BaseIndexer. If the code looks reasonable to you, I'll proceed to add back to this PR (using code from the cancelled PR) the test cases that cover the new step parameter functionality.

@rtpsw
Copy link
Contributor Author

rtpsw commented Feb 12, 2022

I added the tests. Please review.

@rtpsw
Copy link
Contributor Author

rtpsw commented Feb 17, 2022

AFAICS, the current test failures are not related to my changes and possibly got introduced by a merge from master. How to handle them?

) -> tuple[np.ndarray, np.ndarray]:

if step is not None:
raise NotImplementedError("step not implemented for variable offset window")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests hit this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and there are multiple instances of this error in this file that get tested. This is done using @step_not_implemented (defined in pandas/util/_test_decorators.py) on test cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't these check be performed in BaseWindow._validate? Ideally e.g. df.rolling("2s" , step=5)/df.ewm(step=5) would raise at the constructor and not necessarily when an aggregation call is made

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i agree should move this check to be more consistent

# this cannot be done in BaseWindow._validate because there _get_window_indexer
# would erroneously create a fixed window given a window argument like "1s" due
# to _win_freq_i8 not being set
indexer = self._get_window_indexer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just

if self.step is not None:
    raise NotImplementedError(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this would be too restrictive and would fail a numeric step that is actually valid with fixed-length window indexers.

{"value": np.arange(n)},
index=date_range("2015-12-24", periods=n, freq="D"),
)
f = lambda: df.rolling(window=window, step=step)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is better!

Could you test the actual constructors raising though for expanding/groupby rolling/ewm the indexers are not exposed to the user?

i.e. df.expanding(step=3), df.ewm(step=3), df.groupby(key).rolling(step3) should raise.

Also are there tests for df.rolling.cov/corr raising?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've uploaded tests that cover these cases.

@rtpsw
Copy link
Contributor Author

rtpsw commented Feb 24, 2022

@mroeschke: Does the latest commit address your points?

index=date_range("2015-12-24", periods=10, freq="D"),
)
f = lambda: getattr(df.rolling(window=2, step=step), agg)(df)
if step is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just test if step if not None since otherwise we have pretty good testing for cov/corr

index=date_range("2015-12-24", periods=10, freq="D"),
)
f = lambda: func(df)(window=window, step=step)
if step is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just test if step if not None since otherwise we have pretty good testing for cov/corr

timedelta(days=3),
Timedelta(days=3),
"3D",
ExpandingIndexer(window_size=3),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpandingIndexer, ExponentialMovingWindowIndexer, and GroupbyIndexer can be removed since they are tested below

@mroeschke
Copy link
Member

mroeschke commented Feb 25, 2022

@mroeschke: Does the latest commit address your points?

I don't think this has been addressed yet. https://github.com/pandas-dev/pandas/pull/45765/files#r809553567

The pattern in these classes is for constructor argument to be validated either in __init__ or def _validate. To be consistent it would be good for these NotImplementedErrors to exist there instead of BaseIndexer._get_window_bounds

@rtpsw
Copy link
Contributor Author

rtpsw commented Feb 25, 2022

Note that the current test failures appear to be benign (due to pytest not found).

@rtpsw
Copy link
Contributor Author

rtpsw commented Feb 26, 2022

@mroeschke: Does the latest commit address your points?

I don't think this has been addressed yet. https://github.com/pandas-dev/pandas/pull/45765/files#r809553567

The pattern in these classes is for constructor argument to be validated either in __init__ or def _validate. To be consistent it would be good for these NotImplementedErrors to exist there instead of BaseIndexer._get_window_bounds

I see. For unsupported step-argument cases, the current version raises NotImplementedError("step not implemented ...") already in rolling.py, as tested by test_constructor_step_not_implemented. Hopefully, raising at these points gives a clear context to the user.

I looked into also raising in the places you noted. AFAICS, the step argument is not available in pandas/core/indexers/objects.py before get_window_bounds is called. So, raising at constructor time cannot be done there but can be done in pandas/core/window/rolling.py, as in the current version. Let me know what you think.

@mroeschke
Copy link
Member

Sorry, yes by constructor I was referring to BaseWindow.__init__/._validate in pandas/core/window/rolling.py

@rtpsw
Copy link
Contributor Author

rtpsw commented Feb 27, 2022

Sorry, yes by constructor I was referring to BaseWindow.__init__/._validate in pandas/core/window/rolling.py

The current version raises in rolling.py at:

  • BaseWindowGroupby.__init__ to handle group-by cases.
  • RollingAndExpandingMixing.cov and RollingAndExpandingMixing.corr to handle cov/corr cases.
  • Rolling._validate to handle those specific cases of indexers that do not support non-default step arguments.

I'm assuming you want to move the last item to BaseWindow._validate. I tried this before and it doesn't work as expected - as the comment marked GH 15354 in Rolling._validate notes, the wrong indexer is being created in BaseWindow._validate. Let me know what you think.

@mroeschke
Copy link
Member

For exapanding and ewm you can to step validation in those __init__s

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would really like to simply do the validation of step in the consructor and not in the aggregation itself. it feels that you have this NotImplementedError all over the place right now.

) -> tuple[np.ndarray, np.ndarray]:

if step is not None:
raise NotImplementedError("step not implemented for variable offset window")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i agree should move this check to be more consistent

np.zeros(num_values, dtype=np.int64),
np.arange(1, num_values + 1, dtype=np.int64),
)
if step is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

) -> tuple[np.ndarray, np.ndarray]:
if step is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -358,6 +384,8 @@ def get_window_bounds(
)
start_arrays.append(window_indices.take(ensure_platform_int(start)))
end_arrays.append(window_indices.take(ensure_platform_int(end)))
if len(start_arrays) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line here

) -> tuple[np.ndarray, np.ndarray]:

return np.array([0], dtype=np.int64), np.array([num_values], dtype=np.int64)
if step is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -667,6 +698,9 @@ def __init__(
# groupby.<agg_func>, but unexpected to users in
# groupby.rolling.<agg_func>
obj = obj.drop(columns=self._grouper.names, errors="ignore")
# GH 15354
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

@@ -1580,6 +1636,11 @@ def cov(
ddof: int = 1,
**kwargs,
):
if self.step is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback jreback merged commit 6caefb1 into pandas-dev:main Feb 28, 2022
@jreback
Copy link
Contributor

jreback commented Feb 28, 2022

thanks @rtpsw very nice! let's followup as needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rolling window with step size
3 participants