-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Performance regression in stat_ops.FrameMultiIndexOps.time_op #35050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Added code to reproduce @rhshadrach are you able to look into why this is slower now? # 1.0.4
%timeit df.std(level=1)
## -- End pasted text --
3.68 ms ± 51.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# master
%timeit df.std(level=1)
## -- End pasted text --
5.47 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
Thanks - I'll be able to look into this on Sunday. |
Confirmed the timings on 1.0.4, master, as well as PR #34372 directly. This agrees with the slowdown that was reported in #34372 (comment). I also confirmed that single column performance on master beats 1.0.4 by a similar ratio as reported in the comment linked above:
This makes this a duplicate of #34771, although previously that issue only mentioned var -- I've updated it to include std. |
I profiled the call with both versions. Here is my script in case s.b. wants to reproduce: import cProfile, pstats, io
from pstats import SortKey
import timeit
import pandas as pd
import numpy as np
levels = [np.arange(10), np.arange(100), np.arange(100)]
codes = [
np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10),
]
index = pd.MultiIndex(levels=levels, codes=codes)
df = pd.DataFrame(np.random.randn(len(index), 4), index=index)
pr = cProfile.Profile(timeunit=0.000000001)
pr.enable()
df.std(level=1)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue()) Here are the results (I changed the paths at some lines). v1.0.4 (only first 50 lines)
and master:
Seems like we spend a lot more time in |
moved off 1.1.1 milestone (scheduled for this week) as no PRs to fix in the pipeline |
moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline |
moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline |
moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline |
That's as far back as I can install easily |
Thanks @jbrockmendel - closing. |
https://pandas.pydata.org/speed/pandas/index.html#stat_ops.FrameMultiIndexOps.time_op?p-level=1&p-op=%27std%27&commits=c9144ca54dcc924995acae3d9dcb890a5802d7c0
Points to #34372 (cc @rhshadrach), but there was an earlier slowdown.
The text was updated successfully, but these errors were encountered: