-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: nanops #43311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: nanops #43311
Conversation
jbrockmendel
commented
Aug 30, 2021
LGTM. Question how come we only apply this to certain nanops. Why not |
These are the only ones that showed a big difference in the asvs. |
This seems to give a slowdown in the values = np.random.randn(1000000, 4)
In [9]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
47.9 ms ± 554 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # <-- pandas 1.3
18.4 s ± 808 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # <-- master This is a gigantic difference which I noticed in a case for the ArrayManager. But also for BlockManager with relatively wide dataframe it gives a slowdown: values = np.random.randn(1000, 4)
df = pd.DataFrame(values).copy()
In [12]: %timeit df.sum(axis=1)
138 µs ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) # <-- pandas 1.3
1.83 ms ± 80.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- master Since the performance improvement in the examples in the top post were only small (compared to the slowdowns showed above), I would maybe either 1) revert the optimization or 2) add some threshold for the shape (eg only take this custom path if |
seems reasonable |
Some very rough comparisons: values = np.random.randn(10000, 100)
In [5]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
2.05 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- pandas 1.3
174 ms ± 15.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # <-- master
values = np.random.randn(1000, 1000)
In [7]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
2.02 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- pandas 1.3
17 ms ± 2.5 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- master
values = np.random.randn(100, 10000)
In [9]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
1.87 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- pandas 1.3
2.4 ms ± 38.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- master
values = np.random.randn(10, 100000)
In [11]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
2.15 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- pandas 1.3
1.23 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- master Based on this, something |
Added that in #44566 |