Skip to content

ENH: Add fast array equal function for indexers #50592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 6, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 5, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

The all equal case is as fast as numpy on 2 million entries, the early exist is significantly faster obviously

@lithomas1
Copy link
Member

Scrolling through issues, it looks like this is basically #32339.
(Takeaway from there was that perf for non short-circuit case needed to be checked. It's pretty hard to beat numpy's SIMD extensions.)

@phofl
Copy link
Member Author

phofl commented Jan 6, 2023

Commented on this in the initial post. Both are running around 700micro seconds if the whole array is checked (2 million entries). Ours is actually a bit faster, but not by much.

Numpy is doing some things that are more expensive, e.g. allocating a Boolean array and then calling all on it. +

Edit: Looks like NumPy is faster with less elements, but not by that much:

10**6
%timeit np.array_equal(left, right)
269 µs ± 5.64 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit array_equal_fast(left, right)
303 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

and with 2 Million:

%timeit array_equal_fast(left, right)
701 µs ± 79.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit np.array_equal(left, right)
729 µs ± 7.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Edit2: I'd try using this only for indexers in CoW in the beginning. The advantage is, that if an indexer is equal, we avoid a copy which brings a big speedup. And if it's not equal, we should find different values early on.

@cython.wraparound(False)
@cython.boundscheck(False)
def array_equal_fast(
ndarray[int6432_t, ndim=1] left, ndarray[int6432_t, ndim=1] right,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I check locally a little bit, and using a memoryview (int[:]) instead of ndarray doesn't seem to give a significant speed up here (I seemed to remember that those could sometimes be faster)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Also for me locally it more or less gives the exact same timing as np.array_equal (for fully equal arrays)

Copy link
Member

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice speedup!

@phofl phofl added this to the 2.0 milestone Jan 6, 2023
@phofl phofl merged commit 857bf37 into pandas-dev:main Jan 6, 2023
@phofl phofl deleted the enh_array_equal_fast branch January 6, 2023 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants