-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Typing changes for ExtensionArray.astype #41251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
dda21d1
da626a0
e2f27f7
ed106fc
7de25ae
d9ef38f
1807fd2
e683efe
ffee0e5
4815ede
a61e7ae
80d780e
5ce17cb
5d74047
1f19758
a6d7ebe
84777e6
6a135f2
9282e8e
640f4bc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -124,10 +124,8 @@ | |
Axes = Collection[Any] | ||
|
||
# dtypes | ||
NpDtype = Union[str, np.dtype] | ||
Dtype = Union[ | ||
"ExtensionDtype", NpDtype, type_t[Union[str, float, int, complex, bool, object]] | ||
] | ||
NpDtype = Union[str, np.dtype, type_t[Union[str, float, int, complex, bool, object]]] | ||
Dtype = Union["ExtensionDtype", NpDtype] | ||
Comment on lines
+129
to
+130
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this PR need to go in before #41203? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, they should be independent. Either can go first, and once one goes in, I can do the mypy check on the other. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's a big change, as it affects a lot more than just the >>> import numpy.typing as npt
>>> from pandas._typing import NpDtype
>>> NpDtype
typing.Union[str, numpy.dtype, typing.Type[typing.Union[str, float, int, complex, bool, object]]]
>>> npt.DTypeLike
typing.Union[numpy.dtype, NoneType, type, numpy.typing._dtype_like._SupportsDType[numpy.dtype], str, typing.Tuple[typing.Any, int], typing.Tuple[typing.Any, typing.Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], typing.List[typing.Any], numpy.typing._dtype_like._DTypeDict, typing.Tuple[typing.Any, typing.Any]] |
||
# DtypeArg specifies all allowable dtypes in a functions its dtype argument | ||
DtypeArg = Union[Dtype, Dict[Hashable, Dtype]] | ||
DtypeObj = Union[np.dtype, "ExtensionDtype"] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
Sequence, | ||
TypeVar, | ||
cast, | ||
overload, | ||
) | ||
|
||
import numpy as np | ||
|
@@ -26,6 +27,7 @@ | |
ArrayLike, | ||
Dtype, | ||
FillnaOptions, | ||
NpDtype, | ||
PositionalIndexer, | ||
Shape, | ||
) | ||
|
@@ -515,9 +517,17 @@ def nbytes(self) -> int: | |
# Additional Methods | ||
# ------------------------------------------------------------------------ | ||
|
||
def astype(self, dtype, copy=True): | ||
@overload | ||
def astype(self, dtype: NpDtype, copy: bool = ...) -> np.ndarray: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could this use npt.DtypeLike? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No. The following code is fine with mypy
That's because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Never mind. Was able to get this to work by making |
||
... | ||
|
||
@overload | ||
def astype(self, dtype: Dtype, copy: bool = ...) -> ArrayLike: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we pass a EA dtype object, we get a union return type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. Fixed in next commit |
||
... | ||
|
||
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: | ||
Dr-Irv marked this conversation as resolved.
Show resolved
Hide resolved
simonjayhawkins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Cast to a NumPy array with 'dtype'. | ||
Cast to a NumPy array or ExtensionArray with 'dtype'. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -530,8 +540,10 @@ def astype(self, dtype, copy=True): | |
|
||
Returns | ||
------- | ||
array : ndarray | ||
NumPy ndarray with 'dtype' for its dtype. | ||
array : np.ndarray or ExtensionArray | ||
An ExtensionArray if dtype is StringDtype, | ||
or same as that of underlying array. | ||
Otherwise a NumPy ndarray with 'dtype' for its dtype. | ||
""" | ||
from pandas.core.arrays.string_ import StringDtype | ||
|
||
|
@@ -547,7 +559,11 @@ def astype(self, dtype, copy=True): | |
# allow conversion to StringArrays | ||
return dtype.construct_array_type()._from_sequence(self, copy=False) | ||
|
||
return np.array(self, dtype=dtype, copy=copy) | ||
# error: Argument "dtype" to "array" has incompatible type | ||
# "Union[ExtensionDtype, dtype[Any]]"; expected "Union[dtype[Any], None, type, | ||
# _SupportsDType, str, Union[Tuple[Any, int], Tuple[Any, Union[int, | ||
# Sequence[int]]], List[Any], _DTypeDict, Tuple[Any, Any]]]" | ||
return np.array(self, dtype=dtype, copy=copy) # type: ignore[arg-type] | ||
|
||
def isna(self) -> np.ndarray | ExtensionArraySupportsAnyAll: | ||
""" | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,7 +1,10 @@ | ||||||
from __future__ import annotations | ||||||
|
||||||
import numbers | ||||||
from typing import TYPE_CHECKING | ||||||
from typing import ( | ||||||
TYPE_CHECKING, | ||||||
overload, | ||||||
) | ||||||
import warnings | ||||||
|
||||||
import numpy as np | ||||||
|
@@ -13,6 +16,7 @@ | |||||
from pandas._typing import ( | ||||||
ArrayLike, | ||||||
Dtype, | ||||||
NpDtype, | ||||||
type_t, | ||||||
) | ||||||
from pandas.compat.numpy import function as nv | ||||||
|
@@ -392,7 +396,16 @@ def reconstruct(x): | |||||
def _coerce_to_array(self, value) -> tuple[np.ndarray, np.ndarray]: | ||||||
return coerce_to_array(value) | ||||||
|
||||||
def astype(self, dtype, copy: bool = True) -> ArrayLike: | ||||||
@overload | ||||||
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
and elsewhere There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed in next commit |
||||||
... | ||||||
|
||||||
@overload | ||||||
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: | ||||||
... | ||||||
|
||||||
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: | ||||||
|
||||||
""" | ||||||
Cast to a NumPy array or ExtensionArray with 'dtype'. | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ | |
TypeVar, | ||
Union, | ||
cast, | ||
overload, | ||
) | ||
from warnings import ( | ||
catch_warnings, | ||
|
@@ -481,6 +482,14 @@ def _constructor(self) -> type[Categorical]: | |
def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy=False): | ||
return Categorical(scalars, dtype=dtype, copy=copy) | ||
|
||
@overload | ||
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think with dt64/td64 could return DTA/TDA? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't see that: >>> s = pd.Series(pd.to_datetime(["2021-03-15 12:05", "2021-04-05 5:10"]), dtype
="category")
>>> s
0 2021-03-15 12:05:00
1 2021-04-05 05:10:00
dtype: category
Categories (2, datetime64[ns]): [2021-03-15 12:05:00, 2021-04-05 05:10:00]
>>> s.dtype.categories.dtype
dtype('<M8[ns]')
>>> type(s.astype(s.dtype.categories.dtype).values)
<class 'numpy.ndarray'> |
||
... | ||
|
||
@overload | ||
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: | ||
... | ||
|
||
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike: | ||
""" | ||
Coerce this type to another dtype | ||
|
@@ -2464,11 +2473,7 @@ def _str_get_dummies(self, sep="|"): | |
# sep may not be in categories. Just bail on this. | ||
from pandas.core.arrays import PandasArray | ||
|
||
# error: Argument 1 to "PandasArray" has incompatible type | ||
# "ExtensionArray"; expected "Union[ndarray, PandasArray]" | ||
return PandasArray(self.astype(str))._str_get_dummies( # type: ignore[arg-type] | ||
sep | ||
) | ||
return PandasArray(self.astype(str))._str_get_dummies(sep) | ||
|
||
|
||
# The Series.cat accessor | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,7 +29,6 @@ | |
from pandas._typing import ( | ||
Dtype, | ||
DtypeObj, | ||
NpDtype, | ||
Ordered, | ||
type_t, | ||
) | ||
|
@@ -1291,7 +1290,7 @@ class PandasDtype(ExtensionDtype): | |
|
||
_metadata = ("_dtype",) | ||
|
||
def __init__(self, dtype: NpDtype | PandasDtype | None): | ||
def __init__(self, dtype: str | np.dtype | PandasDtype | None): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why this change? what prevents me from passing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fails mypy for some reason. Might have to do with the typing of how the function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
pls let the question-asker do the "mark as resolved" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
(from @Dr-Irv): Yes, it does. In next commit. |
||
if isinstance(dtype, PandasDtype): | ||
# make constructor univalent | ||
dtype = dtype.numpy_dtype | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have had to change it in a previous commit with an earlier version of numpy. Reverted in next commit.