-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: optimized Groupby.diff() #33658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems related to some work @mroeschke has done |
PR would be welcome |
Does #45575 address this? It was merged after this issue was opened. It doesn't use numba but it did get 1000x for a handful of cases |
Any idea how thorough the "handful of cases" were? or if there is non-trivial room for further improvement by implementing something in groupby.pyx? |
#45575 shows the ASV which covers a lot of different cases. Not all are 1000x but most cases see a significant improvement |
OK. im happy to consider this resolved. good job! |
Is your feature request related to a problem?
Doing groupby().diff() with a big dataset and many groups is quite slow. In this image, it is shown how in certain cases optimizing it with numba can get 1000x speed.
Describe the solution you'd like
Now, my question is, can this be optimized in pandas?
I realise the case is somehow special, but i've had to work with small groups and I'm finding some speed issues.
API breaking implications
[this should provide a description of how this feature will affect the API]
Describe alternatives you've considered
[this should provide a description of any alternative solutions or features you've considered]
Additional context
Here's the python code in text format
The text was updated successfully, but these errors were encountered: