PERF: perf regression with mixed-type ops using numexpr (GH5481) #5482

jreback · 2013-11-10T17:44:26Z

BUG: non-unique ops not aligning correctly

these are bascially a trivial op in numpy, so numexpr is slightly slower (but the
the dtype inference issue is fixed). Essentially the recreation of an int64 ndarray had to check if its a datetime-like. In this case just passing in the dtype on the reconstructed series fixes it.

Also handles non-unique columns now (no tests before, and it would fail).

In [1]: df = pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1), "C": np.random.randn(1000000)})

In [2]: pd.computation.expressions.set_use_numexpr(False)

In [3]: %timeit df*df
100 loops, best of 3: 11 ms per loop

In [4]: pd.computation.expressions.set_use_numexpr(True)

In [5]: %timeit df*df
100 loops, best of 3: 15.7 ms per loop

In [6]: df = df.astype(float)

In [7]: pd.computation.expressions.set_use_numexpr(False)

In [8]: %timeit df*df
100 loops, best of 3: 5.16 ms per loop

In [9]: pd.computation.expressions.set_use_numexpr(True)

In [10]: %timeit df*df
100 loops, best of 3: 5.37 ms per loop

BUG: non-unique ops not aligning correctly

jreback · 2013-11-10T18:01:30Z

@jtratner , cc @dsm054

pls give a try with this...should fix the perf issue

dsm054 · 2013-11-10T18:20:43Z

Works for me; in fact numexpr is somewhat faster--

In [4]: pd.computation.expressions.set_use_numexpr(False)

In [5]: %timeit df*df
10 loops, best of 3: 28.3 ms per loop

In [6]: pd.computation.expressions.set_use_numexpr(True)

In [7]: %timeit df*df
10 loops, best of 3: 26.9 ms per loop

jtratner · 2013-11-10T18:24:40Z

Why is this a non-unique index issue? Do you mean that we're special casing unique ops?

jreback · 2013-11-10T18:38:17Z

no the non unique issue is separate
but was convient to fix

PERF: perf regression with mixed-type ops using numexpr (GH5481)

jtratner · 2013-11-10T18:52:08Z

So the actual fix was just passing the dtype explicitly?

jreback · 2013-11-10T19:56:14Z

well had to create the series with an explicit dtype (as opposed to passing a dict of ndarrays)

jtratner · 2013-11-10T20:18:40Z

Got it - makes sense.

PERF: perf regression with mixed-type ops using numexpr (GH5481)

c4f8c54

BUG: non-unique ops not aligning correctly

jreback mentioned this pull request Nov 10, 2013

PERF: combine ops can be block based #5484

Closed

jreback added a commit that referenced this pull request Nov 10, 2013

Merge pull request #5482 from jreback/infer_fix

1804bc3

PERF: perf regression with mixed-type ops using numexpr (GH5481)

jreback merged commit 1804bc3 into pandas-dev:master Nov 10, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: perf regression with mixed-type ops using numexpr (GH5481) #5482

PERF: perf regression with mixed-type ops using numexpr (GH5481) #5482

jreback commented Nov 10, 2013

jreback commented Nov 10, 2013

dsm054 commented Nov 10, 2013

jtratner commented Nov 10, 2013

jreback commented Nov 10, 2013

jtratner commented Nov 10, 2013

jreback commented Nov 10, 2013

jtratner commented Nov 10, 2013

PERF: perf regression with mixed-type ops using numexpr (GH5481) #5482

PERF: perf regression with mixed-type ops using numexpr (GH5481) #5482

Conversation

jreback commented Nov 10, 2013

jreback commented Nov 10, 2013

dsm054 commented Nov 10, 2013

jtratner commented Nov 10, 2013

jreback commented Nov 10, 2013

jtratner commented Nov 10, 2013

jreback commented Nov 10, 2013

jtratner commented Nov 10, 2013