Skip to content

TYP: Typing changes for ExtensionArray.astype #41251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 6, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions pandas/_testing/asserters.py
Original file line number Diff line number Diff line change
Expand Up @@ -399,9 +399,8 @@ def _get_ilevel_values(index, level):
# skip exact index checking when `check_categorical` is False
if check_exact and check_categorical:
if not left.equals(right):
diff = (
np.sum((left._values != right._values).astype(int)) * 100.0 / len(left)
)
thesum = np.sum((left._values != right._values).astype(int))
diff = thesum * 100.0 / len(left)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must have had to change it in a previous commit with an earlier version of numpy. Reverted in next commit.

msg = f"{obj} values are different ({np.round(diff, 5)} %)"
raise_assert_detail(obj, msg, left, right)
else:
Expand Down
6 changes: 2 additions & 4 deletions pandas/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,10 +124,8 @@
Axes = Collection[Any]

# dtypes
NpDtype = Union[str, np.dtype]
Dtype = Union[
"ExtensionDtype", NpDtype, type_t[Union[str, float, int, complex, bool, object]]
]
NpDtype = Union[str, np.dtype, type_t[Union[str, float, int, complex, bool, object]]]
Dtype = Union["ExtensionDtype", NpDtype]
Comment on lines +129 to +130
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this PR need to go in before #41203?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they should be independent. Either can go first, and once one goes in, I can do the mypy check on the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use NpDtype = npt.DTypeLike to be consistent between numpy and pandas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use NpDtype = npt.DTypeLike to be consistent between numpy and pandas?

That's a big change, as it affects a lot more than just the astype() signature. The types are different:

>>> import numpy.typing as npt
>>> from pandas._typing import NpDtype
>>> NpDtype
typing.Union[str, numpy.dtype, typing.Type[typing.Union[str, float, int, complex, bool, object]]]
>>> npt.DTypeLike
typing.Union[numpy.dtype, NoneType, type, numpy.typing._dtype_like._SupportsDType[numpy.dtype], str, typing.Tuple[typing.Any, int], typing.Tuple[typing.Any, typing.Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], typing.List[typing.Any], numpy.typing._dtype_like._DTypeDict, typing.Tuple[typing.Any, typing.Any]]

# DtypeArg specifies all allowable dtypes in a functions its dtype argument
DtypeArg = Union[Dtype, Dict[Hashable, Dtype]]
DtypeObj = Union[np.dtype, "ExtensionDtype"]
Expand Down
25 changes: 20 additions & 5 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
Sequence,
TypeVar,
cast,
overload,
)

import numpy as np
Expand All @@ -26,6 +27,7 @@
ArrayLike,
Dtype,
FillnaOptions,
NpDtype,
PositionalIndexer,
Shape,
)
Expand Down Expand Up @@ -515,9 +517,17 @@ def nbytes(self) -> int:
# Additional Methods
# ------------------------------------------------------------------------

def astype(self, dtype, copy=True):
@overload
def astype(self, dtype: NpDtype, copy: bool = ...) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this use npt.DtypeLike?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this use npt.DtypeLike?

No. The following code is fine with mypy

import numpy.typing as npt

class Foo:
    pass

def myfun(x: npt.DTypeLike):
    pass

myfun(type(Foo))
myfun("abcd")
myfun(str)
myfun(object)
myfun("str")

That's because npt.DTypeLike accepts any type or any str. We want to separate out the ExtensionDtype that are passed as strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. Was able to get this to work by making astype() accept npt.DTypeLike. Created a new type AstypeArg to mean ExtensionDtype | npt.DTypeLike so that I don't have to change lots of other parts where NpDtype is used.

...

@overload
def astype(self, dtype: Dtype, copy: bool = ...) -> ArrayLike:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we pass a EA dtype object, we get a union return type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed in next commit

...

def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
"""
Cast to a NumPy array with 'dtype'.
Cast to a NumPy array or ExtensionArray with 'dtype'.

Parameters
----------
Expand All @@ -530,8 +540,9 @@ def astype(self, dtype, copy=True):

Returns
-------
array : ndarray
NumPy ndarray with 'dtype' for its dtype.
array : ArrayLike
An ExtensionArray if dtype StringDtype or same as that of underlying array.
Otherwise a NumPy ndarray with 'dtype' for its dtype.
"""
from pandas.core.arrays.string_ import StringDtype

Expand All @@ -547,7 +558,11 @@ def astype(self, dtype, copy=True):
# allow conversion to StringArrays
return dtype.construct_array_type()._from_sequence(self, copy=False)

return np.array(self, dtype=dtype, copy=copy)
# error: Argument "dtype" to "array" has incompatible type
# "Union[ExtensionDtype, dtype[Any]]"; expected "Union[dtype[Any], None, type,
# _SupportsDType, str, Union[Tuple[Any, int], Tuple[Any, Union[int,
# Sequence[int]]], List[Any], _DTypeDict, Tuple[Any, Any]]]"
return np.array(self, dtype=dtype, copy=copy) # type: ignore[arg-type]

def isna(self) -> np.ndarray | ExtensionArraySupportsAnyAll:
"""
Expand Down
17 changes: 15 additions & 2 deletions pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
from __future__ import annotations

import numbers
from typing import TYPE_CHECKING
from typing import (
TYPE_CHECKING,
overload,
)
import warnings

import numpy as np
Expand All @@ -13,6 +16,7 @@
from pandas._typing import (
ArrayLike,
Dtype,
NpDtype,
type_t,
)
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -392,7 +396,16 @@ def reconstruct(x):
def _coerce_to_array(self, value) -> tuple[np.ndarray, np.ndarray]:
return coerce_to_array(value)

def astype(self, dtype, copy: bool = True) -> ArrayLike:
@overload
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
def astype(self, dtype: NpDtype, copy: bool = ...) -> np.ndarray:

and elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next commit

...

@overload
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
...

def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:

"""
Cast to a NumPy array or ExtensionArray with 'dtype'.

Expand Down
15 changes: 10 additions & 5 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
TypeVar,
Union,
cast,
overload,
)
from warnings import (
catch_warnings,
Expand Down Expand Up @@ -487,6 +488,14 @@ def _constructor(self) -> type[Categorical]:
def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy=False):
return Categorical(scalars, dtype=dtype, copy=copy)

@overload
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think with dt64/td64 could return DTA/TDA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see that:

>>> s = pd.Series(pd.to_datetime(["2021-03-15 12:05", "2021-04-05 5:10"]), dtype
="category")
>>> s
0   2021-03-15 12:05:00
1   2021-04-05 05:10:00
dtype: category
Categories (2, datetime64[ns]): [2021-03-15 12:05:00, 2021-04-05 05:10:00]
>>> s.dtype.categories.dtype
dtype('<M8[ns]')
>>> type(s.astype(s.dtype.categories.dtype).values)
<class 'numpy.ndarray'>

...

@overload
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
...

def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
"""
Coerce this type to another dtype
Expand Down Expand Up @@ -2468,11 +2477,7 @@ def _str_get_dummies(self, sep="|"):
# sep may not be in categories. Just bail on this.
from pandas.core.arrays import PandasArray

# error: Argument 1 to "PandasArray" has incompatible type
# "ExtensionArray"; expected "Union[ndarray, PandasArray]"
return PandasArray(self.astype(str))._str_get_dummies( # type: ignore[arg-type]
sep
)
return PandasArray(self.astype(str))._str_get_dummies(sep)


# The Series.cat accessor
Expand Down
13 changes: 12 additions & 1 deletion pandas/core/arrays/floating.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

from typing import overload
import warnings

import numpy as np
Expand All @@ -10,7 +11,9 @@
)
from pandas._typing import (
ArrayLike,
Dtype,
DtypeObj,
NpDtype,
)
from pandas.compat.numpy import function as nv
from pandas.util._decorators import cache_readonly
Expand Down Expand Up @@ -271,7 +274,15 @@ def _from_sequence_of_strings(
def _coerce_to_array(self, value) -> tuple[np.ndarray, np.ndarray]:
return coerce_to_array(value, dtype=self.dtype)

def astype(self, dtype, copy: bool = True) -> ArrayLike:
@overload
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
...

@overload
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
...

def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
"""
Cast to a NumPy array or ExtensionArray with 'dtype'.

Expand Down
12 changes: 11 additions & 1 deletion pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

from typing import overload
import warnings

import numpy as np
Expand All @@ -13,6 +14,7 @@
ArrayLike,
Dtype,
DtypeObj,
NpDtype,
)
from pandas.compat.numpy import function as nv
from pandas.util._decorators import cache_readonly
Expand Down Expand Up @@ -335,7 +337,15 @@ def _from_sequence_of_strings(
def _coerce_to_array(self, value) -> tuple[np.ndarray, np.ndarray]:
return coerce_to_array(value, dtype=self.dtype)

def astype(self, dtype, copy: bool = True) -> ArrayLike:
@overload
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
...

@overload
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
...

def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
"""
Cast to a NumPy array or ExtensionArray with 'dtype'.

Expand Down
13 changes: 10 additions & 3 deletions pandas/core/arrays/masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
Any,
Sequence,
TypeVar,
overload,
)

import numpy as np
Expand Down Expand Up @@ -280,9 +281,7 @@ def to_numpy( # type: ignore[override]
if na_value is lib.no_default:
na_value = libmissing.NA
if dtype is None:
# error: Incompatible types in assignment (expression has type
# "Type[object]", variable has type "Union[str, dtype[Any], None]")
dtype = object # type: ignore[assignment]
dtype = object
if self._hasna:
if (
not is_object_dtype(dtype)
Expand All @@ -301,6 +300,14 @@ def to_numpy( # type: ignore[override]
data = self._data.astype(dtype, copy=copy)
return data

@overload
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
...

@overload
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
...

def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
dtype = pandas_dtype(dtype)

Expand Down
4 changes: 1 addition & 3 deletions pandas/core/arrays/sparse/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -523,9 +523,7 @@ def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
try:
dtype = np.result_type(self.sp_values.dtype, type(fill_value))
except TypeError:
# error: Incompatible types in assignment (expression has type
# "Type[object]", variable has type "Union[str, dtype[Any], None]")
dtype = object # type: ignore[assignment]
dtype = object

out = np.full(self.shape, fill_value, dtype=dtype)
out[self.sp_index.to_int_index().indices] = self.sp_values
Expand Down
7 changes: 1 addition & 6 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,12 +229,7 @@ def asarray_tuplesafe(values, dtype: NpDtype | None = None) -> np.ndarray:
# expected "ndarray")
return values._values # type: ignore[return-value]

# error: Non-overlapping container check (element type: "Union[str, dtype[Any],
# None]", container item type: "type")
if isinstance(values, list) and dtype in [ # type: ignore[comparison-overlap]
np.object_,
object,
]:
if isinstance(values, list) and dtype in [np.object_, object]:
return construct_1d_object_array_from_listlike(values)

result = np.asarray(values, dtype=dtype)
Expand Down
4 changes: 1 addition & 3 deletions pandas/core/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -588,9 +588,7 @@ def _sanitize_ndim(
if is_object_dtype(dtype) and isinstance(dtype, ExtensionDtype):
# i.e. PandasDtype("O")

# error: Argument "dtype" to "asarray_tuplesafe" has incompatible type
# "Type[object]"; expected "Union[str, dtype[Any], None]"
result = com.asarray_tuplesafe(data, dtype=object) # type: ignore[arg-type]
result = com.asarray_tuplesafe(data, dtype=object)
cls = dtype.construct_array_type()
result = cls._from_sequence(result, dtype=dtype)
else:
Expand Down
3 changes: 1 addition & 2 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
from pandas._typing import (
Dtype,
DtypeObj,
NpDtype,
Ordered,
type_t,
)
Expand Down Expand Up @@ -1291,7 +1290,7 @@ class PandasDtype(ExtensionDtype):

_metadata = ("_dtype",)

def __init__(self, dtype: NpDtype | PandasDtype | None):
def __init__(self, dtype: str | np.dtype | PandasDtype | None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change? what prevents me from passing int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails mypy for some reason. Might have to do with the typing of how the function np.dtype is typed by numpy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PandasDtype(int) returns PandasDtype('int64')

pls let the question-asker do the "mark as resolved"

Copy link
Member

@simonjayhawkins simonjayhawkins Jul 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does npt.DTypeLike work?

(from @Dr-Irv): Yes, it does. In next commit.

if isinstance(dtype, PandasDtype):
# make constructor univalent
dtype = dtype.numpy_dtype
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1348,7 +1348,7 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
"""
return object dtype as boxed values, such as Timestamps/Timedelta
"""
values = self.values
values: ArrayLike = self.values
if dtype == _dtype_obj:
values = values.astype(object)
# TODO(EA2D): reshape not needed with 2D EAs
Expand Down