-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: support count function for custom BaseIndexer rolling windows #33605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
BaseIndexer support validation is covered by the validation in _get_window_indexer
if self.is_freq_type: | ||
# GH 32865. Also use custom rolling windows calculation | ||
# when using a BaseIndexer subclass | ||
if self.is_freq_type or isinstance(self.window, BaseIndexer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the crux of the issue. Because we didn't check for BaseIndexer
subclass here, the algorithm went down the default path, and the BaseIndexer
implementation got completely ignored as blocks were created using _Window.create_blocks
.
@@ -1171,8 +1171,6 @@ class _Rolling_and_Expanding(_Rolling): | |||
) | |||
|
|||
def count(self): | |||
if isinstance(self.window, BaseIndexer): | |||
validate_baseindexer_support("count") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom BaseIndexer
implementations should never go into this function. Instead, _get_window_indexer
gets called, and there is a validate_baseindexer_support
there (checked that it works by temporarily blacklisting count
again)
Removing this check as unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add an assertion here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@mroeschke @jreback All green, details above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment
@jreback |
thanks @AlexKirko |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Scope of PR
This PR makes sure that when we call
count
with aBaseIndexer
subclass, customstart
andend
arrays get calculated, and the algorithm in theaggregations.pyx
gets called instead of the_Window.create_blocks
.Turns out that we were sending the calculation into
aggregations.pyx
only for frequency-based windows, and customBaseIndexer
subclasses got ignored. Fixed it and cleaned the code up a bit.Background on the wider issue
We currently don't support several rolling window functions when building a rolling window object using a custom class descended from
pandas.api.indexers.Baseindexer
. The implementations were written with backward-looking windows in mind, and this led to these functions breaking.Currently, using these functions returns a
NotImplemented
error thanks to #33057, but ideally we want to update the implementations, so that they will work without a performance hit. This is what I aim to do over a series of PRs.Perf notes
No changes to the algorithms necessary.