-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: array_equivalent #35256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: array_equivalent #35256
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this!
# TODO: pass strict_nane to the .equals methods? | ||
if isinstance(left, ABCExtensionArray): | ||
return left.equals(right) | ||
elif isinstance(right, ABCExtensionArray): | ||
return right.equals(left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? array_equivalent
is supposed to operate on ndarrays (and the BlockManager.equals already handles dispatching the EA.equals when needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR that removed Block.equals I thought you agreed that array_equivalent should dispatch for EAs.
I'm OK with removing this check for now; its orthogonal to the perf goal.
|
||
# empty | ||
if not (np.prod(left.shape) and np.prod(right.shape)): | ||
return True | ||
return ((left == right) | (isna(left) & isna(right))).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to the original fastpath in Block.equals (#34962 (comment)), I think an important factor might also be to use np.isnan
here instead of pd.isnan
(which should be equivalent for float dtype).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, except for things like inf_as_na
... But which we were not using in Block.equals before anyway ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point (two, really). Because of the inf_as_na thing, we should probably consider it as a bugfix and address it separately
if ldtype == rdtype and ldtype.itemsize != 16: | ||
ileft = left.view(f"i{ldtype.itemsize}") | ||
iright = right.view(f"i{rdtype.itemsize}") | ||
return (ileft == iright).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this is always going to work correctly? It's certainly a minor corner case, but there are ways to get different NaNs, see eg https://stackoverflow.com/a/33371882/653364
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah i agree here, nan's would compare not equal, this is eactly why we need the nan mask (you can use np.isnan here though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
darn, fair point. if we cant compare after int-casting, then i think we lose pretty much all of the perf gain. closing.
Indeed, EA.equals doesn't support that (and not sure it should). But see also my inline comment: is it actually needed to handle EAs here? |
maybe merge #35231 first since that PR adds additional tests? |
if ldtype == rdtype and ldtype.itemsize != 16: | ||
ileft = left.view(f"i{ldtype.itemsize}") | ||
iright = right.view(f"i{rdtype.itemsize}") | ||
return (ileft == iright).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah i agree here, nan's would compare not equal, this is eactly why we need the nan mask (you can use np.isnan here though)
good idea |
xref #35249, this gives about a 3x speedup in the asv reported there
cc @jorisvandenbossche EA.equals doesn't support a strict_nan arg, is there anything we need to do to account for that here?