-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
QST/Feature/Bug: On the performance of a rolling window operation when the window is a column of arbitrary integers #46716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This new answer appears to add an obscenely faster method compared to others I've tried, though the correctness of its results is uncertain right now. Though the simplicity of its usage after the custom indexer is set up appears to imply it would be a good candidate for a default feature. |
The author of the new performant method, appears to believe |
There appear to be still incorrectness issues (albeit limited), as mentioned here , so I guess this instead of not just a question and a feature request it's also now a bug report. |
@epigramx pls edit the issue to give a specific issue - if it's a bug then show reproducible code where that is |
@jreback I added code directly replicating the issue (and expected outcome being incorrect) and edited the title slightly. I thought of starting a new issue since this started as a question and a possible feature request, but I guess that could have made it messier. |
Thanks for the report @epigramx. While somewhat related, it would be best to separate out a performance question/issue from a bug report. It's unclear to me what the performance question you're asking here is, but I think I have a guess: if the code you posted in the OP here with Assuming that is the case, I recommend making this issue be a bug report. I've verified that the code you posted does give the incorrect result on main; the result you expected is the correct one. |
@rhshadrach Yes: main focus is now on fixing a bug related to that method since originally the post was about asking for such a method or a default feature of I'm not sure about what I should do to make this into a bug report. Should I close this and make a new one since I can not change the labels of this one? |
It would have been to edit the OP to make it clear that this is a bug report. Labels can be modified by those with triage permissions. With #46726 now opened, these modifications are no longer necessary. I'm still not sure if you're also asking about performance. If #46726 is resolved (i.e. the bug is fixed), does that also give you a sufficiently performat method to do your computation? |
@rhshadrach performance appears to not be an issue with the method posted at the bug report (assuming the bug is fixed). It appears comparable in performance to a regular rolling operation while a slow apply() operation was practically unusable for large dataframes. In fact it appears so snappy that I wouldn't be surprised if it was a default |
Seems like the core issue is being discussed in #46726 so closing in favor of further discussion there |
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/a/71803558/277716
Question about pandas
I linked a specific complete answer at stackoverflow which tackles the problem of deriving the equivalent of
pandas.core.window.rolling.Rolling.max
but the window is an arbitrary column of integers in the same dataframe; however: even if that solution strives to be vectorized: it's extremely slow to the point of becoming unusable for large dataframes compared to the basic case of a constant window size; I suspect it may be impossible to be fast because SIMD hardware may prefer a constant nature of window size.However: I wonder if the devs of the pandas software itself may have ideas of how to do that since they are the ones who have coded the extremely fast (vectorized)
pandas.core.window.rolling.Rolling.max
.It would normally be a feature request for
pandas.DataFrame.rolling
to accept arbitrary integers from a column in the dataframe as awindow
but I don't know if it's even performant to do that.Bug related to later comments below
Expected outcome: index 18 max1 should be -1.487828 instead of -1.932612
source of code and further discussion on the bug at stackoverflow
The text was updated successfully, but these errors were encountered: