-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow users to definite their own window bound calculations in rolling #29878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea. Can you add test(s)?
pandas/_libs/window/indexers.pyx
Outdated
**kwargs : | ||
keyword argument that will be available when get_window_bounds is called | ||
""" | ||
self.__dict__.update(kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah can we avoid this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry, it's a bit cryptic.
The motivation is to allow a way for get_window_bounds
to access an external variable that may not be passed in rolling
or the aggregation function by making it accessible as a class attribute.
For example:
class MyIndexer(BaseIndexer):
def get_window_bounds(self, ...):
start = self.external_column + min_periods
....
indexer = MyIndexer(external_column=df['external_column'])
Could make this pattern in the __init__
more explicit and add validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if u can test with a bad BaseIndexer
eg where get_windows_bounds raises or is missing
we would likely give an odd error message
unit tests otherwise are not needed otherwise except as an explicit smoke test; we are already using this in all rolling tests
pandas/_libs/window/indexers.pyx
Outdated
Computes the fixed bounds of a window. | ||
|
||
Parameters | ||
---------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think could share these docstrings
pandas/core/window/rolling.py
Outdated
""" | ||
Return an indexer class that will compute the window start and end bounds | ||
""" | ||
if isinstance(self.window, window_indexers.BaseIndexer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we don’t actually care if this is a BaseIndexer
rather we care that it has a get_windows_bound with the correct signature
so far the only thing in indexers.pyx for which being cython really matters is VariableWindowIndexer.build, which is a static method, so could just as easily be a module-level function. Do these classes need to be in the cython code? |
Good point @jbrockmendel. I can move these indexers to |
Hello @mroeschke! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-12-04 21:56:57 UTC |
pandas/_libs/window/indexers.pyx
Outdated
def calculate_variable_window_bounds( | ||
int64_t num_values, | ||
int64_t window_size, | ||
object min_periods, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like min_periods is not used. is this a dummy to make the signature match something else? if so, pls comment/docstring
pandas/core/window/rolling.py
Outdated
@@ -400,13 +421,15 @@ def _get_cython_func_type(self, func): | |||
self._get_roll_func("{}_fixed".format(func)), win=self._get_window() | |||
) | |||
|
|||
def _get_window_indexer(self): | |||
def _get_window_indexer(self, index_as_array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can index_as_array or the return type be annotated?
win_type: Optional[str] = None, | ||
) -> Tuple[np.ndarray, np.ndarray]: | ||
|
||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractMethodError?
pandas/core/window/indexers.py
Outdated
**kwargs : | ||
keyword argument that will be available when get_window_bounds is called | ||
""" | ||
self.index = index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name index
makes me expect a pd.Index. Is there a less ambiguous name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. not using ExpandingIndexer right now, correct?
object center, # unused but here to match get_window_bounds signature | ||
object closed, | ||
const int64_t[:] index | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string
min_periods = calculate_min_periods( | ||
window, self.min_periods, len(x), require_min_periods, floor | ||
if not isinstance(window, BaseIndexer): | ||
min_periods = calculate_min_periods( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do these outside of the function? (as they don't depend on anything inside) and makes logic simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These both depend on x
after the nan
s are appended from the center
calculation.
x = np.concatenate((x, additional_nans))
start, end = window_indexer( | ||
x, window, self.closed, index_as_array | ||
).get_window_bounds() | ||
if np.any(np.diff(start) < 0) or np.any(np.diff(end) < 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just set is_monotonic_bounds=np.any(.......)
and pass that in line 482
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to add is_monotonic_bounds
as an argument to all the "fixed" functions in Cython (didn't include them originally since the "fixed" function doesn't need this argument), but I suppose that makes this cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I cannot easily modify this yet without another refactor. func
here isn't always a cython function that could accept is_monotonic_bounds
; sometimes it's a python wrapped function like apply_func
or _zqrt
. I can look into this in another refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k, followup is ok
@jreback. Correct, just exists for testing purposes. |
@@ -198,11 +199,10 @@ def roll_sum_variable(ndarray[float64_t] values, ndarray[int64_t] start, | |||
s = start[i] | |||
e = end[i] | |||
|
|||
if i == 0: | |||
if i == 0 or not is_monotonic_bounds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we do a simple calculcate in start/end to determine if is_monotonic is true here? it likely is quite performant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ping on green.
@TomAugspurger @jorisvandenbossche if any comments
pandas/core/window/rolling.py
Outdated
self._validate_get_window_bounds_signature(self.window) | ||
|
||
@staticmethod | ||
def _validate_get_window_bounds_signature(window): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment on the docs. Happy to defer to others on the implementation.
All green @jreback Collecting followups: |
thanks @mroeschke very nice |
Apologies for the very late PR comment, but shouldn't |
Could you show an example? |
|
|
I'm not suggesting changing the default. My suggestion is that if the "X is a non-fixed frequency" exception is being thrown, it has detected that the passed offset is non-fixed, and thus could intelligently fall back to the variable indexer automatically. I'd think a typical user who is passing in a non-fixed offset is doing so deliberately and isn't wanting an exception. |
I think a pull request to improve exception message to suggest using |
The code located here just raises: pandas/pandas/core/window/rolling.py Lines 1831 to 1838 in 0a8b45f
But this could just be another clause right here that sets pandas/pandas/core/window/rolling.py Lines 1859 to 1861 in 0a8b45f
|
Currently
rolling
allows integer and offset aswindow
arguments. This PR allows users to define a custom method for calculating window boundaries by passing a subclass of a newBaseIndexer
Could help address more custom rolling window operations as described in #26959, #13969, #25510, #24295
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff