-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: deduplicate __setitem__ and _reduce on masked arrays #34187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: deduplicate __setitem__ and _reduce on masked arrays #34187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -77,6 +84,23 @@ def __getitem__(self, item): | |||
|
|||
return type(self)(self._data[item], self._mask[item]) | |||
|
|||
def _coerce_to_array(self, values) -> Tuple[np.ndarray, np.ndarray]: | |||
raise AbstractMethodError(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orthogranal to this PR but wondering if we even need this any more or could just use abc.abstractmethod decorator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally use this instead of the decorator in the EA base class, because the decorator requires it to actually be an abstract base class.
See
pandas/pandas/core/arrays/base.py
Lines 128 to 131 in 45fee32
This class does not inherit from 'abc.ABCMeta' for performance reasons. | |
Methods and properties required by the interface raise | |
``pandas.errors.AbstractMethodError`` and no ``register`` method is | |
provided for registering virtual subclasses. |
result = op(data, axis=0, skipna=skipna, mask=mask, **kwargs) | ||
|
||
if np.isnan(result): | ||
return libmissing.NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a follow up, should this be self.dtype.na_value instead of hardcoded in the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally like having it explicit as well (all subclasses use NA
as missing value indicator, this method is not shared with ones where dtype.na_value
is something different), but can go either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm,
we have
@property
def _na_value(self):
return self.dtype.na_value
so this is not consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also remove that property, from a quick search, that is only used once in the arrays code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, if not generally on an array class, I think self.dtype.na_value is generally clearer.
This is not the only case where libmissing.NA is used, so we should use one ore the other. I accept your argument about being explicit but also prefer the the case of allowing the class to be extensible. YAGNI favours your proposal.
For
__setitem__
, the only difference was that they both call to their own "coercing" function, so I made this available as a helper method, so it the rest of the implementation of__setitem__
can be shared.For
_reduce
, everything was the same, BooleanArray just has an additional check forany
andall
.