-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: .equals for Extension Arrays #30652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
36c8b88
786963c
6800315
a3e7b7f
860013f
fc3d2c2
3da5726
c5027dd
375664c
b6ad2fb
365362a
aae2f94
8d052ad
9ee034e
38501e6
dccec7f
0b1255f
b8be858
4c7273f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,7 @@ class ExtensionArray: | |
dropna | ||
factorize | ||
fillna | ||
equals | ||
isna | ||
ravel | ||
repeat | ||
|
@@ -84,6 +85,7 @@ class ExtensionArray: | |
* _from_factorized | ||
* __getitem__ | ||
* __len__ | ||
* __eq__ | ||
* dtype | ||
* nbytes | ||
* isna | ||
|
@@ -333,6 +335,24 @@ def __iter__(self): | |
for i in range(len(self)): | ||
yield self[i] | ||
|
||
def __eq__(self, other: Any) -> ArrayLike: | ||
""" | ||
Return for `self == other` (element-wise equality). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should have some guidance here that if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could use ops.common.unpack_zerodim_and_defer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This method is to be implemented by EA authors, so those can't use that helper (unless we expose somewhere a public version of this). (we could of course use that for our own EAs, but this PR is not changing any existing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Added a comment about that |
||
""" | ||
# Implementer note: this should return a boolean numpy ndarray or | ||
# a boolean ExtensionArray. | ||
# When `other` is one of Series, Index, or DataFrame, this method should | ||
# return NotImplemented (to ensure that those objects are responsible for | ||
# first unpacking the arrays, and then dispatch the operation to the | ||
# underlying arrays) | ||
raise AbstractMethodError(self) | ||
|
||
def __ne__(self, other: Any) -> ArrayLike: | ||
""" | ||
Return for `self != other` (element-wise in-equality). | ||
""" | ||
return ~(self == other) | ||
|
||
def to_numpy( | ||
self, dtype=None, copy: bool = False, na_value=lib.no_default | ||
) -> np.ndarray: | ||
|
@@ -682,6 +702,35 @@ def searchsorted(self, value, side="left", sorter=None): | |
arr = self.astype(object) | ||
return arr.searchsorted(value, side=side, sorter=sorter) | ||
|
||
def equals(self, other: "ExtensionArray") -> bool: | ||
""" | ||
Return if another array is equivalent to this array. | ||
|
||
Parameters | ||
---------- | ||
other: ExtensionArray | ||
Array to compare to this Array. | ||
|
||
Returns | ||
------- | ||
boolean | ||
Whether the arrays are equivalent. | ||
|
||
""" | ||
if not type(self) == type(other): | ||
return False | ||
Comment on lines
+723
to
+724
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should verify that this is the behavior we want. Namely
The first seems fine. Not sure about the second. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was planning to open an issue about this after this PR (and keep it strict here), because this is right now a bit inconsistent within pandas, and might require a more general discussion / clean-up (eg Series.equals is more strict (requires same dtype) than Index.equals ...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to be strict for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made it even more strict (same dtype, not just same class), and added a test for that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I opened an issue for this at #33940 In the end, it seems mainly to come to whether the dtype should exactly be equal or not. Since for EAs, the dtype is right now tied to the array class, using equal dtype for now also implies the same class (no additional check to allow sublcasses). |
||
elif not self.dtype == other.dtype: | ||
return False | ||
elif not len(self) == len(other): | ||
return False | ||
else: | ||
equal_values = self == other | ||
if isinstance(equal_values, ExtensionArray): | ||
# boolean array with NA -> fill with False | ||
equal_values = equal_values.fillna(False) | ||
equal_na = self.isna() & other.isna() | ||
return (equal_values | equal_na).all().item() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's to have a python bool, instead of a numpy bool, as result. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for clarifying. i was recently reminded that |
||
|
||
def _values_for_factorize(self) -> Tuple[np.ndarray, Any]: | ||
""" | ||
Return an array and missing value suitable for factorization. | ||
|
@@ -1134,7 +1183,7 @@ class ExtensionScalarOpsMixin(ExtensionOpsMixin): | |
""" | ||
|
||
@classmethod | ||
def _create_method(cls, op, coerce_to_dtype=True): | ||
def _create_method(cls, op, coerce_to_dtype=True, result_dtype=None): | ||
""" | ||
A class method that returns a method that will correspond to an | ||
operator for an ExtensionArray subclass, by dispatching to the | ||
|
@@ -1202,7 +1251,7 @@ def _maybe_convert(arr): | |
# exception raised in _from_sequence; ensure we have ndarray | ||
res = np.asarray(arr) | ||
else: | ||
res = np.asarray(arr) | ||
res = np.asarray(arr, dtype=result_dtype) | ||
return res | ||
|
||
if op.__name__ in {"divmod", "rdivmod"}: | ||
|
@@ -1220,4 +1269,4 @@ def _create_arithmetic_method(cls, op): | |
|
||
@classmethod | ||
def _create_comparison_method(cls, op): | ||
return cls._create_method(op, coerce_to_dtype=False) | ||
return cls._create_method(op, coerce_to_dtype=False, result_dtype=bool) |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1864,6 +1864,11 @@ def where( | |||||||||
|
||||||||||
return [self.make_block_same_class(result, placement=self.mgr_locs)] | ||||||||||
|
||||||||||
def equals(self, other) -> bool: | ||||||||||
if self.dtype != other.dtype or self.shape != other.shape: | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do you need this check here? (its already done on the .values no?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, good question. I just copied that from the base Block.equals method (so it's done there as well). There, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least from the docstring of pandas/pandas/core/dtypes/missing.py Lines 360 to 363 in 3ed7dff
But so, for EA.equals that is not needed, so will remove that check here. |
||||||||||
return False | ||||||||||
return self.values.equals(other.values) | ||||||||||
|
||||||||||
def _unstack(self, unstacker, fill_value, new_placement): | ||||||||||
# ExtensionArray-safe unstack. | ||||||||||
# We override ObjectBlock._unstack, which unstacks directly on the | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add the doc reference here