-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: share astype code in MaskedArray #38490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,16 +5,18 @@ | |
import numpy as np | ||
|
||
from pandas._libs import lib, missing as libmissing | ||
from pandas._typing import Scalar | ||
from pandas._typing import ArrayLike, Dtype, Scalar | ||
from pandas.errors import AbstractMethodError | ||
from pandas.util._decorators import cache_readonly, doc | ||
|
||
from pandas.core.dtypes.base import ExtensionDtype | ||
from pandas.core.dtypes.common import ( | ||
is_dtype_equal, | ||
is_integer, | ||
is_object_dtype, | ||
is_scalar, | ||
is_string_dtype, | ||
pandas_dtype, | ||
) | ||
from pandas.core.dtypes.missing import isna, notna | ||
|
||
|
@@ -229,6 +231,30 @@ def to_numpy( | |
data = self._data.astype(dtype, copy=copy) | ||
return data | ||
|
||
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: | ||
dtype = pandas_dtype(dtype) | ||
|
||
if is_dtype_equal(dtype, self.dtype): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here could in theory actually use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. im fine with it either way. this branch was broken off from one that was trying to push some astype boilerplate into a decorator so used the is_dtype_equal version |
||
if copy: | ||
return self.copy() | ||
return self | ||
|
||
# if we are astyping to another nullable masked dtype, we can fastpath | ||
if isinstance(dtype, BaseMaskedDtype): | ||
# TODO deal with NaNs for FloatingArray case | ||
data = self._data.astype(dtype.numpy_dtype, copy=copy) | ||
# mask is copied depending on whether the data was copied, and | ||
# not directly depending on the `copy` keyword | ||
mask = self._mask if data is self._data else self._mask.copy() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wrote this code myself in the past, but I am not actually sure now this is needed. AFAIK the (now since this is copying existing code, fine to leave this for another issue) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense, and i think you're right about the numpy behavior |
||
cls = dtype.construct_array_type() | ||
return cls(data, mask, copy=False) | ||
|
||
if isinstance(dtype, ExtensionDtype): | ||
eacls = dtype.construct_array_type() | ||
return eacls._from_sequence(self, dtype=dtype, copy=copy) | ||
|
||
return super().astype(dtype, copy=copy) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is in theory not reachable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not with the existing subclasses. this still seemed like the idiomatic thing to do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would personally remove it then, if it's not reachable at the moment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. im not wild about this method relying on the current behavior of the existing subclasses. better to conform to the standard patterns, even if it means a couple of extra checks that turn out to be unnecessary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could also call it |
||
|
||
__array_priority__ = 1000 # higher than ndarray so ops dispatch to us | ||
|
||
def __array__(self, dtype=None) -> np.ndarray: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know this method is only called from the subclasses, this line is not needed (each of the subclass methods already does it as well. The type annotation also indicates we already have a dtype object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heads up "Dtype" includes string; "DtypeObj" means a dtype object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. Can you then update the annotation to DtypeObj? Or actually, to ExtensionDtype? Or will mypy complain about that because the methods in the subclasses have a less strict annotation?
BTW, I would still remove this line