TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

twoertwein · 2020-10-07T04:49:54Z

tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff

The removed for-loop doesn't seem to be necessary (I hope this code is tested by an existing test).

I feel like I'm missing an obvious reason why these for-loops are needed: looking at the code I don't think we need them and the tests also pass.

jreback

yeah i don't remember why this is like this for the non-monotonic case.

can you see what tests are actually hitting this (it could be not very many / nothing).

twoertwein · 2020-10-07T17:43:23Z

there are exactly two tests that cover at least the sum part (mean/var/skew/kurtosis do not seem to be covered!).

The tests are test_indexer_constructor_arg and test_indexer_accepts_rolling_args, both from pandas/tests/window/test_base_indexer.py.

I will mark this PR as a draft and look into tests for these functions on the weekend.

I assume that the easiest way to trigger the non-monotonic branch for variable windows is by having a dataframe with a datetime index that is not sorted, or?

edit: I think the only way to trigger the variable non-monotonic part is through using a BaseIndexer with rolling. A ValueError is thrown for non-monotonic datetime.

twoertwein · 2020-10-08T22:30:50Z

I printed the values I set to zero on master: they are zero. I assume the testcase has some randomness/platform-specific behavior?

jreback · 2020-10-09T20:12:04Z

cc @mroeschke

pandas/tests/window/test_rolling.py

jreback · 2020-10-10T15:51:37Z

cc @mroeschke

pandas/tests/window/test_rolling.py

mroeschke · 2020-10-10T17:54:48Z

While we're addressing this

Could you name all the is_monotonic_* variables to is_monotonic_increasing_* just for clarity?
Could you see if we have tests for a monotonically decreasing index (and add some tests if there are none)? That's what should hit this code path

twoertwein · 2020-10-10T18:45:34Z

Could you name all the is_monotonic_* variables to is_monotonic_increasing_* just for clarity?

Will do

Could you see if we have tests for a monotonically decreasing index (and add some tests if there are none)? That's what should hit this code path

I think the not is_monotonic_bounds branch calculates the requested statistic separately for each window (naive calculation but with cython) and is probably meant for non-monotonic bounds. If decreasing indices are common (they fall currently in the not is_monotonic_bounds branch), we could have a far better alternative: re-use the is_monotonic_bounds branch but switch the add_/remove_ calls?

mroeschke · 2020-10-10T19:00:06Z

If you trace back the usage of is_monotonic_bounds, it checks the entire bounds at the start for monotonically increasing indexes.

I didn't implement the non-monotonic bound rolling case, but happy to change it if there a more efficient way to do it.

twoertwein · 2020-10-10T20:48:13Z

I didn't implement the non-monotonic bound rolling case

Isn't that already handled by the current not is_monotonic_bounds branch? Since that code calculates the statistic for each window completely independently, it can work with any windows.

I would have expected that rolling is much slower for decreasing indices as it processes windows independently of each other. At least in this simple test, that is not the case :)

df = pd.DataFrame({'values': pd.np.random.rand(1000)})
df_reverse = pd.DataFrame({'values': df['values'][::-1]}, index=df.index[::-1])

%timeit df.rolling(window=10).mean()
645 µs ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df_reverse.rolling(window=10).mean()
653 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

twoertwein · 2020-10-11T16:56:46Z

kurt/skew are sometimes different between increasing/decreasing indices (at an order of 1-e8 for kurt and 1-e10 for skew). I would like to belief that the decreasing indices are more accurate as we set the accumulated value to zero instead of removing the to be deleted values.

@mroeschke is it okay to 1) use deterministic values for testing (instead of np.random.rand) 2) and then "fine-tune" the maximal difference to avoid failing the tests?

mroeschke · 2020-10-11T18:19:34Z

@mroeschke is it okay to 1) use deterministic values for testing (instead of np.random.rand) 2) and then "fine-tune" the maximal difference to avoid failing the tests?

Definitely, generally in favor of using deterministic values for testing vs np.random.rand

twoertwein · 2020-10-12T21:26:50Z

should be good now :) @mroeschke @jreback

jreback

looks fine to me. please add a whatsnew note (1.2. bug fixes in rolling).

can you also hand calculate at least mean/sum in your test (i know you hard-code which is good) to make sure that we like those results. Just not 100% trusting 1.1.3

twoertwein · 2020-10-16T20:28:56Z

okay, I will manually compute the expected results.

About whatsnew: this PR doesn't add any new features and doesn't fix any bugs. Just tests and avoiding for-loops for non-increasing indices.

jreback · 2020-10-17T00:03:53Z

okay, I will manually compute the expected results.

About whatsnew: this PR doesn't add any new features and doesn't fix any bugs. Just tests and avoiding for-loops for non-increasing indices.

so results in 1.1.3 are ok, we just regressed somehow on master but haven't since haven't released 1.2 this is ok. great.

ping when ready. #37166 is going to rebase after this is in.

twoertwein · 2020-10-17T00:18:37Z

rebased and added note that the expected statistics for sum/mean have been verified.

twoertwein · 2020-10-17T01:52:41Z

@jreback green'ish (unrelated CI failure)

…c indices

mroeschke · 2020-10-25T18:39:49Z

Mind merging master and fixing up the code checks error?

twoertwein · 2020-10-25T18:45:18Z

@mroeschke I think the code check errors are not caused by this PR. I rebased it now, let's see whether that fixes it.

twoertwein · 2020-10-25T20:21:02Z

@mroeschke green except two unrelated windows CI failures

mroeschke · 2020-10-25T22:09:55Z

Thanks @twoertwein. Awesome find and patch!

…c indices (pandas-dev#36933)

twoertwein changed the title ~~roll_sum_variable: simplification for non-monotonic indices~~ CLN: roll_sum_variable: simplification for non-monotonic indices Oct 7, 2020

jreback requested changes Oct 7, 2020

View reviewed changes

jreback added the Window rolling, ewma, expanding label Oct 7, 2020

twoertwein marked this pull request as draft October 7, 2020 17:43

twoertwein marked this pull request as ready for review October 8, 2020 16:04

twoertwein marked this pull request as draft October 8, 2020 17:02

twoertwein changed the title ~~CLN: roll_sum_variable: simplification for non-monotonic indices~~ TST/CLN: roll_sum/mean/var/skew/kurt_variable: simplification for non-monotonic indices Oct 8, 2020

twoertwein commented Oct 9, 2020

View reviewed changes

pandas/tests/window/test_rolling.py Show resolved Hide resolved

twoertwein marked this pull request as ready for review October 9, 2020 22:20

twoertwein changed the title ~~TST/CLN: roll_sum/mean/var/skew/kurt_variable: simplification for non-monotonic indices~~ TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices Oct 10, 2020

mroeschke reviewed Oct 10, 2020

View reviewed changes

pandas/tests/window/test_rolling.py Outdated Show resolved Hide resolved

twoertwein requested a review from jreback October 15, 2020 03:02

jreback added this to the 1.2 milestone Oct 16, 2020

jreback requested changes Oct 16, 2020

View reviewed changes

jreback mentioned this pull request Oct 16, 2020

BUG: Bug in quantile() and median() returned wrong result for non monotonic window borders #37166

Merged

5 tasks

mroeschke approved these changes Oct 21, 2020

View reviewed changes

CLN/TST: roll_sum/mean/var/skew/kurt: simplification for non-monotoni…

8829bf2

…c indices

mroeschke approved these changes Oct 25, 2020

View reviewed changes

mroeschke merged commit d592e5e into pandas-dev:master Oct 25, 2020

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020

CLN/TST: roll_sum/mean/var/skew/kurt: simplification for non-monotoni…

e910494

…c indices (pandas-dev#36933)

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

CLN/TST: roll_sum/mean/var/skew/kurt: simplification for non-monotoni…

b537437

…c indices (pandas-dev#36933)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

twoertwein commented Oct 7, 2020 •

edited

Loading

jreback left a comment

twoertwein commented Oct 7, 2020 •

edited

Loading

twoertwein commented Oct 8, 2020

jreback commented Oct 9, 2020

jreback commented Oct 10, 2020

mroeschke commented Oct 10, 2020

twoertwein commented Oct 10, 2020

mroeschke commented Oct 10, 2020

twoertwein commented Oct 10, 2020 •

edited

Loading

twoertwein commented Oct 11, 2020

mroeschke commented Oct 11, 2020

twoertwein commented Oct 12, 2020

jreback left a comment

twoertwein commented Oct 16, 2020

jreback commented Oct 17, 2020

twoertwein commented Oct 17, 2020

twoertwein commented Oct 17, 2020

mroeschke commented Oct 25, 2020

twoertwein commented Oct 25, 2020

twoertwein commented Oct 25, 2020

mroeschke commented Oct 25, 2020

TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

Conversation

twoertwein commented Oct 7, 2020 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

twoertwein commented Oct 7, 2020 • edited Loading

twoertwein commented Oct 8, 2020

jreback commented Oct 9, 2020

jreback commented Oct 10, 2020

mroeschke commented Oct 10, 2020

twoertwein commented Oct 10, 2020

mroeschke commented Oct 10, 2020

twoertwein commented Oct 10, 2020 • edited Loading

twoertwein commented Oct 11, 2020

mroeschke commented Oct 11, 2020

twoertwein commented Oct 12, 2020

jreback left a comment

Choose a reason for hiding this comment

twoertwein commented Oct 16, 2020

jreback commented Oct 17, 2020

twoertwein commented Oct 17, 2020

twoertwein commented Oct 17, 2020

mroeschke commented Oct 25, 2020

twoertwein commented Oct 25, 2020

twoertwein commented Oct 25, 2020

mroeschke commented Oct 25, 2020

twoertwein commented Oct 7, 2020 •

edited

Loading

twoertwein commented Oct 7, 2020 •

edited

Loading

twoertwein commented Oct 10, 2020 •

edited

Loading