Re-evaluate the minimum number of elements to use numexpr for elementwise ops #40500

jorisvandenbossche · 2021-03-18T16:07:50Z

Currently we have a MIN_ELEMENTS set at 10,000:

pandas/pandas/core/computation/expressions.py

Lines 42 to 43 in 00a6224

    
           # the minimum prod shape that we will use numexpr 
        
           _MIN_ELEMENTS = 10000

However, I have been noticing while running lots of performance comparisons recently, that numexpr still seems to show some overhead at that array size compared to numpy.

I did a few specific timings for a few ops comparing numpy and numexpr for a set of different array sizes:

Code used to create the plot

import operator

import numpy as np
import pandas as pd
import numexpr as ne

import seaborn as sns

results = []

for s in [10**3, 10**4, 10**5, 10**6, 10**7, 10**8]:
    arr1 = np.random.randn(s)
    arr2 = np.random.randn(s)
    
    for op_str, op in [("+", operator.add), ("*", operator.mul), ("==", operator.eq), ("<=", operator.le)]:
        
        res_ne = %timeit -o ne.evaluate(f"a {op_str} b", local_dict={"a": arr1, "b": arr2}, casting="safe")
        res_np = %timeit -o op(arr1, arr2)
        
        results.append({"size": s, "op": op_str, "engine": "numepxr", "timing": res_ne.average, "timing_stdev": res_np.stdev})
        results.append({"size": s, "op": op_str, "engine": "numpy", "timing": res_np.average, "timing_stdev": res_np.stdev})
        

df = pd.DataFrame(results)

fig = sns.relplot(data=df, x="size", y="timing", hue="engine", col="op", kind="line", col_wrap=2)
fig.set(xscale='log', yscale='log')

So in general, numexpr is not that much faster for the large arrays. But specifically, it still has a significant overhead compared to numpy up to 1e5 - 1e6, while the current minimum number of elements is 1e4.

Further, this might depend on your specific hardware and versions etc (this was run on my linux laptop with 8 cores, using latest versions of numpy and numexpr). So it is always hard to give a default suitable for all.

But based on the analysis above, I would propose raising the minimum from 1e4 to 1e5 (or maybe even 1e6).

jorisvandenbossche · 2021-03-18T16:25:31Z

An example from the arithmetic.IntFrameWithScalar.time_frame_op_with_scalar benchmark, which basically used the following code snippet:

pd.options.mode.data_manager = "array"

dtype = np.float64
arr = np.random.randn(20000, 100)
df = pd.DataFrame(arr.astype(dtype))
scalar = 3.0

Using the current MIN_ELEMENTS of 1e4:

In [3]: %timeit df <= scalar
9.58 ms ± 945 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Changing the MIN_ELEMENTS to 1e5 (which means that in this case, numpy will be used):

In [3]: %timeit df <= scalar
2.92 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So here, the overhead is very clear, which is specifically true for ArrayManager which does the ops column-by-column, so paying the overhead for each column again.

Using BlockManager, the above benchmark doesn't change, because it still uses numexpr (the whole block size is still above 1e5 elements).

jbrockmendel · 2021-03-18T20:12:09Z

If these results are representative, id question the value of using numexpr at all.

jbrockmendel · 2021-03-18T20:23:46Z

Following the same code used to create the plot in the OP:

df2 = df.set_index(["size", "op", "engine"])
df3 = df2['timing']
df4 = df3.unstack('engine')

In [32]: df4['numepxr'] / df4['numpy']
    ...: 
Out[32]: 
size       op
1000       *     11.550891
           +     11.219127
           <=    13.992761
           ==    14.566631
10000      *     12.232232
           +     12.482489
           <=    16.391351
           ==    16.528711
100000     *      1.901619
           +      2.050886
           <=     2.245838
           ==     2.364996
1000000    *      0.980222
           +      0.978871
           <=     0.379410
           ==     0.398617
10000000   *      0.937947
           +      0.931268
           <=     0.719618
           ==     0.698704
100000000  *      0.513396
           +      0.555649
           <=     0.759315
           ==     0.688430
dtype: float64

These make numexpr look better than it did in the OP, though that might just be the log scale.

jorisvandenbossche · 2021-03-18T23:17:43Z

Thanks for running it as well! Numbers from different environments are useful.

These make numexpr look better than it did in the OP, though that might just be the log scale.

Yeah, indeed, the log scale was mainly to be able to show the difference on the smaller sizes (without log scale that wouldn't be visible). There is indeed still an advantage of numexpr (although the differences I see locally are smaller).

But to conclude, your numbers support the same conclusion I think that 1e4 as threshold is too small, and it should be at least 1e5 or even 1e6.

rhshadrach · 2021-03-18T23:47:23Z

Results from my laptop (Core i7-10850H) are about the same

size       op
1000       *     11.379373
           +     12.133413
           <=    14.257864
           ==    14.337322
10000      *      9.552697
           +      9.197450
           <=    12.201087
           ==    12.175155
100000     *      1.483704
           +      1.404112
           <=     1.854857
           ==     1.892871
1000000    *      0.822717
           +      0.822774
           <=     0.464316
           ==     0.484865
10000000   *      1.166277
           +      1.155721
           <=     0.663739
           ==     0.630829
100000000  *      0.767271
           +      0.781812
           <=     0.635647
           ==     0.644891
dtype: float64

jreback · 2021-03-19T00:13:38Z

the original benchmarks for using numexpr were a number of years ago

it's certainly possible that numpy has improved in the interim

so +1 on raising the min elements

jorisvandenbossche · 2021-03-19T09:57:46Z

OK, based on the numbers above, 1e6 seems a safer minimum than 1e5. Update the PR to reflect that: #40502

rhshadrach · 2021-03-20T13:55:15Z

Is this good to close @jorisvandenbossche?

rhshadrach · 2021-03-20T23:24:50Z

The issue in #40502 appears to be in test_expressions. Some of DataFrames tested on there used to be 300KiB, but were changed to being 30MiB. It looks like many copies are being made in setup_method, resulting in large memory usage.

jorisvandenbossche · 2021-03-24T08:42:31Z

Yeah, I checked that at the time of doing the PR, and thought that 30MB won't be a big deal, but of course it's created and copied multiple times, .. (and also already created during test discovery and kept alive during the full test run), so underestimated the impact.

Next attempt: #40609

jorisvandenbossche added Performance Memory or execution speed performance Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 18, 2021

jorisvandenbossche mentioned this issue Mar 18, 2021

PERF: increase the minimum number of elements to use numexpr for ops from 1e4 to 1e6 #40502

Merged

jorisvandenbossche mentioned this issue Mar 24, 2021

PERF: increase the minimum number of elements to use numexpr for ops from 1e4 to 1e6 #40609

Merged

jreback added this to the 1.3 milestone Mar 24, 2021

jreback closed this as completed in #40609 Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-evaluate the minimum number of elements to use numexpr for elementwise ops #40500

Re-evaluate the minimum number of elements to use numexpr for elementwise ops #40500

jorisvandenbossche commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021

jbrockmendel commented Mar 18, 2021

jbrockmendel commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021 •

edited

Loading

rhshadrach commented Mar 18, 2021

jreback commented Mar 19, 2021

jorisvandenbossche commented Mar 19, 2021

rhshadrach commented Mar 20, 2021

rhshadrach commented Mar 20, 2021

jorisvandenbossche commented Mar 24, 2021

Re-evaluate the minimum number of elements to use numexpr for elementwise ops #40500

Re-evaluate the minimum number of elements to use numexpr for elementwise ops #40500

Comments

jorisvandenbossche commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021

jbrockmendel commented Mar 18, 2021

jbrockmendel commented Mar 18, 2021

jorisvandenbossche commented Mar 18, 2021 • edited Loading

rhshadrach commented Mar 18, 2021

jreback commented Mar 19, 2021

jorisvandenbossche commented Mar 19, 2021

rhshadrach commented Mar 20, 2021

rhshadrach commented Mar 20, 2021

jorisvandenbossche commented Mar 24, 2021

jorisvandenbossche commented Mar 18, 2021 •

edited

Loading