-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Performance regression in DataFrame reduction ops #38592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think it's in this commit: 7043f8f. |
That commit has been reverted, but that didn't seem to have solved the performance regression. @jbrockmendel can you look into this? |
does this mean we can un-revert? its a nice little cleanup
cant make any promises at the moment, but ill keep a tab open for this so it doesnt fall off the radar screen entirely. |
No, it was reverted because of a behaviour regression -> #38607
If you can't look at it on the short term, I would propose to revert the change for now, until you have time (otherwise later it might become much more difficult to do that, if it turns out to be needed). |
Ah, I think the reason is that it is now doing a non-copy ravel, while before the copying ravel ensured we would have good layout for a performant reduction (while this is not the case when the DataFrame is created from a 2D numpy array). |
Huh, and to think I made a non-copying ravel specifically with perf in mind. you learn something every day |
@jbrockmendel @jorisvandenbossche Any idea what the status is here? |
my mistake, i thought this was the one discussed on the call on wednesday. no idea off the top of my head what the status is here. |
yeah, no idea either. will leave the 1.3 milestone for now since this was a regression on master early on when we started 1.3. need further investigation |
This is still causing a regression in the reductions (for integer dtypes) compared to 1.2.x. |
Now, based on my comment above (#38592 (comment)), this might depend on the memory layout of the DataFrame, and thus on how the DataFrame was created. But so we maybe should remove some of the extra code added in #38507 to ensure the memory layout is preserved in the |
The
stat_ops.FrameOps
are generally showing a slowdown since a few days, see eg https://pandas.pydata.org/speed/pandas/#stat_ops.FrameOps.time_op?python=3.8&Cython=0.29.21&p-op='mean'&p-dtype='int'&p-axis=0&commits=8dbb593d-7043f8faThe indicated range is 8dbb593...7043f8f, but don't directly see which commit might be the cause cc @jbrockmendel
The text was updated successfully, but these errors were encountered: