-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: support min/max functions for rolling windows with custom BaseIndexer #33180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit c8d79f0.
The core idea is that we ancor the output index we write at to the index in the input value array it corresponds to. This generalizes the function and also removes the need to fiddle with Aside from that, I reuse the original algorithm with some minor adjustments that allow the functions to work with any monotonic windows (window starts and ends stay in place or move to the right). |
On another note, with these changes, it is no longer necessary to have separate code for initializing the first value and for writing the final value. Everything can be done in the main loop. |
pandas/tests/window/test_rolling.py
Outdated
|
||
values = np.arange(10) | ||
|
||
indexer = ForwardIndexer(window_size=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm are you anticpating exposing this to users? should we be using this internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean allowing the creation of a forward-looking rolling window without creating a custom BaseIndexer
subclass?
My goal was to make sure that our implementations are robust against custom indexers with arbitrary offsets relative to the position in the input array they are calculated for. Additionally, I made this implementation robust against weird stuff, like smoothly expanding or contracting windows, for example.
The ForwardIndexer
class is used in testing for convenience, because it is, by far, the most common customization people expect us to support, in my limited experience.
We could make forward-looking a Boolean argument in the rolling window constructor, but I don't know how useful of an enhancement it would be. In my opinion, it's enough to make sure that we support reasonable customizations based on BaseIndexer
and let the users take care of the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be interesting to provide a standard set of BaseIndexer
s in the library. ForwardIndexer
may be one that we provide once we standardize the definition and API.
@AlexKirko feel free to open a new issue about including ForwardIndexer
and BackwardIndexer
in the pandas.api.indexers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, created a new issue. Will be happy to submit a PR in a day or two.
pandas/tests/window/test_rolling.py
Outdated
("max", [2.0, 3.0, 4.0, 100.0, 100.0, 100.0, 8.0, 9.0, 9.0, np.nan]), | ||
], | ||
) | ||
def test_rolling_forward_window(constructor, func, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put this test in pandas/tests/window/test_base_indexer.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pandas/tests/window/test_rolling.py
Outdated
|
||
indexer = ForwardIndexer(window_size=3) | ||
rolling = constructor(values).rolling(window=indexer, min_periods=2) | ||
result = getattr(rolling, func)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test this against rolling.apply(lambda x: max(x))
(or the equivalent)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke
Should we drop testing against an explicitly provided array then? Doing both seems redundant to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to keep the explicit array as the source of truth on the off chance that rolling(Indexer).max()
and rolling.apply(lambda x: max(x)
both incorrectly change in the same direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's keep it then. I was afraid of something like this, which is why I started by testing against an explicit array and not against apply.
@jreback |
will looks again soon |
Changed the |
Some pipelines are canceling on timeout. Should have nothing to do with this PR, but I restarted the tests just in case. |
Hm. py37_np_dev seems to be hanging for everyone. If it doesn't recover after an hour, I'll post an issue (unless someone beats me to it). |
thanks @AlexKirko |
Thanks! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
The problem
We currently don't support several rolling window functions when building a rolling window object using a custom class descended from
pandas.api.indexers.Baseindexer
. The implementations were written with backward-looking windows in mind, and this led to these functions breaking.Currently, using these functions returns a
NotImplemented
error thanks to #33057, but ideally we want to update the implementations, so that they will work without a performance hit. This is what I aim to do over a series of PRs.Scope of PR
This PR proposes an alternative implementation for the
min
andmax
rolling window functions implemented in the_roll_min_max_variable
function in theaggregations.pyx
file.Perf note
This implementation is slightly faster than the one on master branch (tested with a
Series
with 10e6 randomly generated values andtimeit
). If necessary, I can add a benchmark to the benchmark suite, although it must be noted that we need to benchmark with large arrays of data to minimize the effect of overhead.