-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: 6x perf hit from using numexpr #5481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
installing bottleneck doesn't help (not that it would really make sense if it did) |
I can reproduce: python 2.7.4, pandas 0.12.0-1083-g9e2c3d6, numpy 1.9.0.dev-8a2728c, numexpr 2.2.2:
|
So prun suggests that infer_dtype is the cost, I'm looking into it but I'm guessing that it has to do with the different return type from numexpr |
Still happens in 0.12 as well |
Issue is with mixed dtypes, float only and int only are both faster as expected. Maybe the cols are getting pushed together. |
This is the result of multiplying dataframes 10 times by themselves (didn't divide time by 10): Frames: (pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1), # mixed
"C": np.random.randn(1000000)}))
(pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1)})) # int
(pd.DataFrame({"C": np.random.randn(1000000)})) # float
|
1 million x 3 dataframe, using numexpr takes 142ms to do multiplication, with numexpr disabled, takes 21.6ms. I'm going to investigate, but would be helpful to know if you can reproduce this anywhere else.
The text was updated successfully, but these errors were encountered: