-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i don't remember why this is like this for the non-monotonic case.
can you see what tests are actually hitting this (it could be not very many / nothing).
there are exactly two tests that cover at least the sum part (mean/var/skew/kurtosis do not seem to be covered!). The tests are I will mark this PR as a draft and look into tests for these functions on the weekend. I assume that the easiest way to trigger the non-monotonic branch for variable windows is by having a dataframe with a datetime index that is not sorted, or? edit: I think the only way to trigger the variable non-monotonic part is through using a |
I printed the values I set to zero on master: they are zero. I assume the testcase has some randomness/platform-specific behavior? |
cc @mroeschke |
cc @mroeschke |
While we're addressing this
|
Will do
I think the |
If you trace back the usage of I didn't implement the non-monotonic bound rolling case, but happy to change it if there a more efficient way to do it. |
Isn't that already handled by the current I would have expected that df = pd.DataFrame({'values': pd.np.random.rand(1000)})
df_reverse = pd.DataFrame({'values': df['values'][::-1]}, index=df.index[::-1])
%timeit df.rolling(window=10).mean()
645 µs ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df_reverse.rolling(window=10).mean()
653 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) |
kurt/skew are sometimes different between increasing/decreasing indices (at an order of 1-e8 for kurt and 1-e10 for skew). I would like to belief that the decreasing indices are more accurate as we set the accumulated value to zero instead of removing the to be deleted values. @mroeschke is it okay to 1) use deterministic values for testing (instead of |
Definitely, generally in favor of using deterministic values for testing vs np.random.rand |
should be good now :) @mroeschke @jreback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine to me. please add a whatsnew note (1.2. bug fixes in rolling).
can you also hand calculate at least mean/sum in your test (i know you hard-code which is good) to make sure that we like those results. Just not 100% trusting 1.1.3
okay, I will manually compute the expected results. About whatsnew: this PR doesn't add any new features and doesn't fix any bugs. Just tests and avoiding for-loops for non-increasing indices. |
so results in 1.1.3 are ok, we just regressed somehow on master but haven't since haven't released 1.2 this is ok. great. ping when ready. #37166 is going to rebase after this is in. |
rebased and added note that the expected statistics for sum/mean have been verified. |
@jreback green'ish (unrelated CI failure) |
Mind merging master and fixing up the code checks error? |
@mroeschke I think the code check errors are not caused by this PR. I rebased it now, let's see whether that fixes it. |
@mroeschke green except two unrelated windows CI failures |
Thanks @twoertwein. Awesome find and patch! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
The removed for-loop doesn't seem to be necessary (I hope this code is tested by an existing test).
I feel like I'm missing an obvious reason why these for-loops are needed: looking at the code I don't think we need them and the tests also pass.