-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Typing changes for ExtensionArray.astype #41251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/arrays/base.py
Outdated
@@ -511,7 +513,15 @@ def nbytes(self) -> int: | |||
# Additional Methods | |||
# ------------------------------------------------------------------------ | |||
|
|||
def astype(self, dtype, copy=True): | |||
@overload | |||
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dt64/td64 generally go to DTA/TDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think so. When you pass a numpy dtype, you get a np.ndarray
:
>>> a=pd.array([np.datetime64("2021-03-15"), np.datetime64("2021-04-05")])
>>> a
<DatetimeArray>
['2021-03-15 00:00:00', '2021-04-05 00:00:00']
Length: 2, dtype: datetime64[ns]
>>> b=a.astype(np.datetime64)
>>> b
array(['2021-03-15T00:00:00.000000000', '2021-04-05T00:00:00.000000000'],
dtype='datetime64[ns]')
>>> type(b)
<class 'numpy.ndarray'>
@simonjayhawkins can you review, please? |
This is now failing mypy with master merged. can you merge master and fixup. |
pandas/core/arrays/base.py
Outdated
... | ||
|
||
@overload | ||
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could add an overload for ExtensionDtype and this overload could be str for the union return type if type_t[Union[str, float, int, complex, bool, object] is moved as suggested?
the accepted strings for numpy is already known? so we should eventually be able to remove the union return type.
@simonjayhawkins pushed changes a week ago, so looking forward to a new review. |
pandas/core/dtypes/dtypes.py
Outdated
@@ -1291,7 +1290,7 @@ class PandasDtype(ExtensionDtype): | |||
|
|||
_metadata = ("_dtype",) | |||
|
|||
def __init__(self, dtype: NpDtype | PandasDtype | None): | |||
def __init__(self, dtype: str | np.dtype | PandasDtype | None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? what prevents me from passing int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fails mypy for some reason. Might have to do with the typing of how the function np.dtype
is typed by numpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PandasDtype(int)
returns PandasDtype('int64')
pls let the question-asker do the "mark as resolved"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does npt.DTypeLike work?
(from @Dr-Irv): Yes, it does. In next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch is 260 commits behind master and if merged would cause a CI failure
pandas/core/arrays/period.py:346: error: unused "type: ignore" comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dr-Irv
is it worth posting the results of a test file with reveal_types for this?
pandas/_testing/asserters.py
Outdated
np.sum((left._values != right._values).astype(int)) * 100.0 / len(left) | ||
) | ||
thesum = np.sum((left._values != right._values).astype(int)) | ||
diff = thesum * 100.0 / len(left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have had to change it in a previous commit with an earlier version of numpy. Reverted in next commit.
NpDtype = Union[str, np.dtype, type_t[Union[str, float, int, complex, bool, object]]] | ||
Dtype = Union["ExtensionDtype", NpDtype] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this PR need to go in before #41203?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they should be independent. Either can go first, and once one goes in, I can do the mypy check on the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use NpDtype = npt.DTypeLike
to be consistent between numpy and pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use
NpDtype = npt.DTypeLike
to be consistent between numpy and pandas?
That's a big change, as it affects a lot more than just the astype()
signature. The types are different:
>>> import numpy.typing as npt
>>> from pandas._typing import NpDtype
>>> NpDtype
typing.Union[str, numpy.dtype, typing.Type[typing.Union[str, float, int, complex, bool, object]]]
>>> npt.DTypeLike
typing.Union[numpy.dtype, NoneType, type, numpy.typing._dtype_like._SupportsDType[numpy.dtype], str, typing.Tuple[typing.Any, int], typing.Tuple[typing.Any, typing.Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], typing.List[typing.Any], numpy.typing._dtype_like._DTypeDict, typing.Tuple[typing.Any, typing.Any]]
pandas/core/arrays/boolean.py
Outdated
@@ -392,7 +396,16 @@ def reconstruct(x): | |||
def _coerce_to_array(self, value) -> tuple[np.ndarray, np.ndarray]: | |||
return coerce_to_array(value) | |||
|
|||
def astype(self, dtype, copy: bool = True) -> ArrayLike: | |||
@overload | |||
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray: | |
def astype(self, dtype: NpDtype, copy: bool = ...) -> np.ndarray: |
and elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in next commit
pandas/core/arrays/base.py
Outdated
@@ -515,9 +517,17 @@ def nbytes(self) -> int: | |||
# Additional Methods | |||
# ------------------------------------------------------------------------ | |||
|
|||
def astype(self, dtype, copy=True): | |||
@overload | |||
def astype(self, dtype: NpDtype, copy: bool = ...) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this use npt.DtypeLike?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this use npt.DtypeLike?
No. The following code is fine with mypy
import numpy.typing as npt
class Foo:
pass
def myfun(x: npt.DTypeLike):
pass
myfun(type(Foo))
myfun("abcd")
myfun(str)
myfun(object)
myfun("str")
That's because npt.DTypeLike
accepts any type or any str. We want to separate out the ExtensionDtype
that are passed as strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. Was able to get this to work by making astype()
accept npt.DTypeLike
. Created a new type AstypeArg
to mean ExtensionDtype | npt.DTypeLike
so that I don't have to change lots of other parts where NpDtype
is used.
pandas/core/arrays/base.py
Outdated
... | ||
|
||
@overload | ||
def astype(self, dtype: Dtype, copy: bool = ...) -> ArrayLike: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we pass a EA dtype object, we get a union return type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed in next commit
pandas/core/dtypes/dtypes.py
Outdated
@@ -1291,7 +1290,7 @@ class PandasDtype(ExtensionDtype): | |||
|
|||
_metadata = ("_dtype",) | |||
|
|||
def __init__(self, dtype: NpDtype | PandasDtype | None): | |||
def __init__(self, dtype: str | np.dtype | PandasDtype | None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does npt.DTypeLike work?
(from @Dr-Irv): Yes, it does. In next commit.
Can you point me to an example? |
@simonjayhawkins ping... |
pinging @simonjayhawkins |
@Dr-Irv thanks for the patience. can you rebase @simonjayhawkins if you can have a look |
weekly ping @simonjayhawkins |
another ping @simonjayhawkins |
biweekly ping @simonjayhawkins |
@Dr-Irv ok if you can merge master and make sure still passing (i think it was last) time, will merge these. |
@jreback all green except for a Windows timeout on one of the tests |
can you rebase again just to be sure (as merged the other) cc @twoertwein |
@jreback all green after merge with the delete/searchsorted changes |
thanks @Dr-Irv |
Minimal (hopefully) typing changes for
ExtensionArray.astype
.