Skip to content

PERF: perf regression with mixed-type ops using numexpr (GH5481) #5482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 10, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Nov 10, 2013

closes #5481

BUG: non-unique ops not aligning correctly

these are bascially a trivial op in numpy, so numexpr is slightly slower (but the
the dtype inference issue is fixed). Essentially the recreation of an int64 ndarray had to check if its a datetime-like. In this case just passing in the dtype on the reconstructed series fixes it.

Also handles non-unique columns now (no tests before, and it would fail).

In [1]: df = pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1), "C": np.random.randn(1000000)})

In [2]: pd.computation.expressions.set_use_numexpr(False)

In [3]: %timeit df*df
100 loops, best of 3: 11 ms per loop

In [4]: pd.computation.expressions.set_use_numexpr(True)

In [5]: %timeit df*df
100 loops, best of 3: 15.7 ms per loop

In [6]: df = df.astype(float)

In [7]: pd.computation.expressions.set_use_numexpr(False)

In [8]: %timeit df*df
100 loops, best of 3: 5.16 ms per loop

In [9]: pd.computation.expressions.set_use_numexpr(True)

In [10]: %timeit df*df
100 loops, best of 3: 5.37 ms per loop

BUG: non-unique ops not aligning correctly
@jreback
Copy link
Contributor Author

jreback commented Nov 10, 2013

@jtratner , cc @dsm054

pls give a try with this...should fix the perf issue

@dsm054
Copy link
Contributor

dsm054 commented Nov 10, 2013

Works for me; in fact numexpr is somewhat faster--

In [4]: pd.computation.expressions.set_use_numexpr(False)

In [5]: %timeit df*df
10 loops, best of 3: 28.3 ms per loop

In [6]: pd.computation.expressions.set_use_numexpr(True)

In [7]: %timeit df*df
10 loops, best of 3: 26.9 ms per loop

@jtratner
Copy link
Contributor

Why is this a non-unique index issue? Do you mean that we're special casing unique ops?

@jreback
Copy link
Contributor Author

jreback commented Nov 10, 2013

no the non unique issue is separate
but was convient to fix

jreback added a commit that referenced this pull request Nov 10, 2013
PERF: perf regression with mixed-type ops using numexpr (GH5481)
@jreback jreback merged commit 1804bc3 into pandas-dev:master Nov 10, 2013
@jtratner
Copy link
Contributor

So the actual fix was just passing the dtype explicitly?

@jreback
Copy link
Contributor Author

jreback commented Nov 10, 2013

well had to create the series with an explicit dtype (as opposed to passing a dict of ndarrays)

@jtratner
Copy link
Contributor

Got it - makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: 6x perf hit from using numexpr
3 participants