Skip to content

BUG: support skew function for custom BaseIndexer rolling windows #33745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 25, 2020

Conversation

AlexKirko
Copy link
Member

Scope of PR

This PR does a couple things to fix the performance of skew, and this also fixes the behavior of all the functions that have require_min_periods for small values of min_periods:

  • clarify docs. Name all sample methods uniformly, add a note to the caveat that users should in general be careful about using sample methods with windows
  • fix bug in _apply that made us never go into the BaseIndexer control flow branch
  • fix bug in _apply: pass window_indexer.window_size into calc_min_periods instead of min_periods or 1. We want the custom indexer window size there, so it's not clear to me why we were passing min_periods or 1 .

Details

The algorithm itself is robust, but it defaults to sample skewness, which is why there was a difference between its output and numpy. To prevent misunderstandings, I clarified the docs a bit.
We were also passing a wrong value to calc_min_periods, and we weren't going into the proper if branch, because we were checking the type of window instead of self.window.

Background on the wider issue

We currently don't support several rolling window functions when building a rolling window object using a custom class descended from pandas.api.indexers.Baseindexer. The implementations were written with backward-looking windows in mind, and this led to these functions breaking.
Currently, using these functions returns a NotImplemented error thanks to #33057, but ideally we want to update the implementations, so that they will work without a performance hit. This is what I aim to do over a series of PRs.

Perf notes

No changes to algorithms.

Clarify that any method that has Sample before its name needs to
be used carefully with windows as this is almost never what the
user wants.
Also fix the value passed to calculate_min_periods
for custom BaseIndexer implementations.
@AlexKirko
Copy link
Member Author

@jreback @mroeschke
Here is the fix for skew with custom rolling windows. Mainly, this returns sample skewness, similarly to std and var, but I also stumbled onto a couple adjacent bugs while calling this with small values of min_periods.

Fixed those and edited the docs to be more clear.

Details in the description above.

@AlexKirko
Copy link
Member Author

AlexKirko commented Apr 23, 2020

Did something happen? Why are we running CI only on macOS? Can't find the issue that led to this . . .

@jreback jreback added the Window rolling, ewma, expanding label Apr 23, 2020
@jreback jreback added this to the 1.1 milestone Apr 23, 2020
@jreback
Copy link
Contributor

jreback commented Apr 23, 2020

#33747

@jreback
Copy link
Contributor

jreback commented Apr 23, 2020

/azp run

@azure-pipelines
Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@jreback jreback closed this Apr 24, 2020
@jreback jreback reopened this Apr 24, 2020
@AlexKirko
Copy link
Member Author

AlexKirko commented Apr 24, 2020

@jreback
There still seem to be problems with CI. I'm reasonably confident that this fix isn't platform-dependent, but please say if you'd like to wait until we can launch all the pipelines and have them report back.

Plenty horrible stuff gets into software because 'it should have worked'.

@jreback
Copy link
Contributor

jreback commented Apr 25, 2020

k can you merge master again

CI should run

@AlexKirko
Copy link
Member Author

@jreback
Done, all green.

@jreback jreback merged commit cb71376 into pandas-dev:master Apr 25, 2020
@jreback
Copy link
Contributor

jreback commented Apr 25, 2020

thanks @AlexKirko

@AlexKirko AlexKirko deleted the rolling-skewness branch April 26, 2020 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants