-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: array_equivalent #35256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: array_equivalent #35256
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -385,15 +385,23 @@ def array_equivalent(left, right, strict_nan: bool = False) -> bool: | |
... np.array([1, 2, np.nan])) | ||
False | ||
""" | ||
# TODO: pass strict_nane to the .equals methods? | ||
if isinstance(left, ABCExtensionArray): | ||
return left.equals(right) | ||
elif isinstance(right, ABCExtensionArray): | ||
return right.equals(left) | ||
|
||
left, right = np.asarray(left), np.asarray(right) | ||
ldtype = left.dtype | ||
rdtype = right.dtype | ||
|
||
# shape compat | ||
if left.shape != right.shape: | ||
return False | ||
|
||
# Object arrays can contain None, NaN and NaT. | ||
# string dtypes must be come to this path for NumPy 1.7.1 compat | ||
if is_string_dtype(left.dtype) or is_string_dtype(right.dtype): | ||
if is_string_dtype(ldtype) or is_string_dtype(rdtype): | ||
|
||
if not strict_nan: | ||
# isna considers NaN and None to be equivalent. | ||
|
@@ -425,28 +433,29 @@ def array_equivalent(left, right, strict_nan: bool = False) -> bool: | |
return True | ||
|
||
# NaNs can occur in float and complex arrays. | ||
if is_float_dtype(left.dtype) or is_complex_dtype(left.dtype): | ||
if is_float_dtype(ldtype) or is_complex_dtype(ldtype): | ||
if ldtype == rdtype and ldtype.itemsize != 16: | ||
ileft = left.view(f"i{ldtype.itemsize}") | ||
iright = right.view(f"i{rdtype.itemsize}") | ||
return (ileft == iright).all() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we sure this is always going to work correctly? It's certainly a minor corner case, but there are ways to get different NaNs, see eg https://stackoverflow.com/a/33371882/653364 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm yeah i agree here, nan's would compare not equal, this is eactly why we need the nan mask (you can use np.isnan here though) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. darn, fair point. if we cant compare after int-casting, then i think we lose pretty much all of the perf gain. closing. |
||
|
||
# empty | ||
if not (np.prod(left.shape) and np.prod(right.shape)): | ||
return True | ||
return ((left == right) | (isna(left) & isna(right))).all() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Compared to the original fastpath in Block.equals (#34962 (comment)), I think an important factor might also be to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, except for things like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good point (two, really). Because of the inf_as_na thing, we should probably consider it as a bugfix and address it separately |
||
|
||
elif is_datetimelike_v_numeric(left, right): | ||
# GH#29553 avoid numpy deprecation warning | ||
return False | ||
|
||
elif needs_i8_conversion(left.dtype) or needs_i8_conversion(right.dtype): | ||
elif needs_i8_conversion(ldtype) or needs_i8_conversion(rdtype): | ||
# datetime64, timedelta64, Period | ||
if not is_dtype_equal(left.dtype, right.dtype): | ||
if not is_dtype_equal(ldtype, rdtype): | ||
return False | ||
|
||
left = left.view("i8") | ||
right = right.view("i8") | ||
|
||
# if we have structured dtypes, compare first | ||
if left.dtype.type is np.void or right.dtype.type is np.void: | ||
if left.dtype != right.dtype: | ||
if ldtype.type is np.void or rdtype.type is np.void: | ||
if ldtype != rdtype: | ||
return False | ||
|
||
return np.array_equal(left, right) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
array_equivalent
is supposed to operate on ndarrays (and the BlockManager.equals already handles dispatching the EA.equals when needed).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR that removed Block.equals I thought you agreed that array_equivalent should dispatch for EAs.
I'm OK with removing this check for now; its orthogonal to the perf goal.