PERF: short-circuit (left == right).all() comparisons #32339

jbrockmendel · 2020-02-28T19:20:46Z

In places like equals methods and array_equivalent, we do things like (left == right).all() or ((left == right) | (isna(left) & isna(right))).all(). For large arrays that are not equal, we can do much better with something like:

def all_match(left, right) -> bool:
    if left.dtype.kind != "i":
        # viewing as i8 will make NaNs be treated as equal
        return _all_match_i8(left.view("i8"), right.view("i8"))

    return _all_match_i8(left, right)

cdef bint _all_match_i8(const int64_t[:] left, const int64_t[:] right):
    cdef:
        Py_ssize_t i, n = len(left)

    for i in range(n):
        if left[i] != right[i]:
            return False

    return True

Some profiling results:

In [2]: arr = np.arange(10**6)                                                                                                                                                      
In [3]: arr2 = arr.copy()                                                                                                                                                           
In [4]: arr2[0] = -1                                                                                                                                                                

In [5]: %timeit np.array_equal(arr, arr2)                                                                                                                                           
831 µs ± 42.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit all_match(arr, arr2)                                                                                                                                                
1.27 µs ± 58.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit np.array_equal(arr, arr)                                                                                                                                            
416 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [8]: %timeit all_match(arr, arr)                                                                                                                                                 
812 µs ± 5.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So in cases that short circuit early, we can get massive speedups, but this implementation is actually 2x slower in cases that dont short-circuit (for reasons that are not clear to me).

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2020-02-28T19:22:56Z

cc @seberg any idea why this is so much slower in the not-short-circuit case?

seberg · 2020-02-28T19:51:35Z

@jbrockmendel my guess is: We are using SIMD loops in this case (at least sometimes), and cython probably does not manage to do that.

Are you sure what you are doing is a good idea though? First, -0 and 0 are not different normally. Second, there are many many NaNs, and you are making some of them return True.

jbrockmendel · 2020-02-28T21:06:40Z

We are using SIMD loops in this case (at least sometimes), and cython probably does not manage to do that.

@scoder thoughts?

scoder · 2020-02-29T08:40:13Z

SIMD

Might make a difference here, yes. Make sure your CFLAGS allow for auto-vectorisation, and that your C compiler is modern enough to figure it out for you.

It sometimes helps the compiler to make your algorithm redundant, i.e. instead of just a[i] != b[i] make sure the arrays are long enough, take two steps at once and write a[i] != b[i] or a[i+1] != b[i+1], or even using the uglier | instead of or (try to avoid that if you don't need it). That way, the code makes it clear to the C compiler that it can safely read ahead far enough to use wider SIMD instructions.

jbrockmendel added the Performance Memory or execution speed performance label Mar 1, 2020

jbrockmendel mentioned this issue Jul 8, 2020

REF: dont consolidate in BlockManager.equals #34962

Merged

jbrockmendel mentioned this issue May 20, 2021

[Enhancement] Add functions that extend anynan for non-finite values pydata/bottleneck#374

Open

mroeschke added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jul 29, 2021

lithomas1 mentioned this issue Jan 6, 2023

ENH: Add fast array equal function for indexers #50592

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: short-circuit (left == right).all() comparisons #32339

PERF: short-circuit (left == right).all() comparisons #32339

jbrockmendel commented Feb 28, 2020

jbrockmendel commented Feb 28, 2020

seberg commented Feb 28, 2020

jbrockmendel commented Feb 28, 2020

scoder commented Feb 29, 2020

PERF: short-circuit (left == right).all() comparisons #32339

PERF: short-circuit (left == right).all() comparisons #32339

Comments

jbrockmendel commented Feb 28, 2020

jbrockmendel commented Feb 28, 2020

seberg commented Feb 28, 2020

jbrockmendel commented Feb 28, 2020

scoder commented Feb 29, 2020