-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Performance Regression in frame_methods.Equals.time_frame_float_equal #35249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like optimizing array_equivalent for the floating case gets a 3x speedup. we'll need to operate blockwise to get the rest back. |
i guess we didn't merge #35256 |
Yeah. I started on some small changes that gets a chunk of performance back by optimizing
But these just get us partway back. We're still orders of magnitude slower for the somewhat wide table case. In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: N = 10**3
In [4]: float_df = pd.DataFrame(np.random.randn(N, N))
# 1.0.5
1.52 ms ± 94 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# master
83.6 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# branch
31.2 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) One thought though, this new implementation is better in some cases since it short circuits. As soon as there's a column that differs we return False. So that's something. |
The "correct" fix is to operate blockwise. I'm happy to implement that, and it can re-use a bunch of code from the blockwise arithmetic. But I'm not inclined to spend a lot of time on blockwise stuff if we're going all-1D. |
Mostly agreed. I think for now I'll push up my small optimizations, and we can accept some slowdown for wide dataframes. |
https://pandas.pydata.org/speed/pandas/index.html#frame_methods.Equals.time_frame_float_equal?commits=77f4a4d673e2d2b3f7396a94818771d332bd913c
Some discussion starting at #34962 (comment)
@jbrockmendel what needs to be done for 1.1, just optimize
array_equivalent
or something larger? Is this a blocker (if so can you add the blocker tag)?The text was updated successfully, but these errors were encountered: