Skip to content

TYP: Typing changes for ExtensionArray.astype #41251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 6, 2021

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented May 1, 2021

Minimal (hopefully) typing changes for ExtensionArray.astype .

@Dr-Irv Dr-Irv requested a review from simonjayhawkins May 1, 2021 19:37
@@ -511,7 +513,15 @@ def nbytes(self) -> int:
# Additional Methods
# ------------------------------------------------------------------------

def astype(self, dtype, copy=True):
@overload
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dt64/td64 generally go to DTA/TDA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so. When you pass a numpy dtype, you get a np.ndarray:

>>> a=pd.array([np.datetime64("2021-03-15"), np.datetime64("2021-04-05")])
>>> a
<DatetimeArray>
['2021-03-15 00:00:00', '2021-04-05 00:00:00']
Length: 2, dtype: datetime64[ns]
>>> b=a.astype(np.datetime64)
>>> b
array(['2021-03-15T00:00:00.000000000', '2021-04-05T00:00:00.000000000'],
      dtype='datetime64[ns]')
>>> type(b)
<class 'numpy.ndarray'>

@simonjayhawkins simonjayhawkins added ExtensionArray Extending pandas with custom dtypes or arrays. Typing type annotations, mypy/pyright type checking labels May 2, 2021
@Dr-Irv Dr-Irv requested a review from simonjayhawkins May 10, 2021 12:17
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented May 16, 2021

@simonjayhawkins can you review, please?

@simonjayhawkins
Copy link
Member

This is now failing mypy with master merged. can you merge master and fixup.

...

@overload
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add an overload for ExtensionDtype and this overload could be str for the union return type if type_t[Union[str, float, int, complex, bool, object] is moved as suggested?

the accepted strings for numpy is already known? so we should eventually be able to remove the union return type.

@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 22, 2021
@simonjayhawkins simonjayhawkins removed this from the 1.3 milestone May 24, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented May 30, 2021

@simonjayhawkins pushed changes a week ago, so looking forward to a new review.

@@ -1291,7 +1290,7 @@ class PandasDtype(ExtensionDtype):

_metadata = ("_dtype",)

def __init__(self, dtype: NpDtype | PandasDtype | None):
def __init__(self, dtype: str | np.dtype | PandasDtype | None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change? what prevents me from passing int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails mypy for some reason. Might have to do with the typing of how the function np.dtype is typed by numpy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PandasDtype(int) returns PandasDtype('int64')

pls let the question-asker do the "mark as resolved"

Copy link
Member

@simonjayhawkins simonjayhawkins Jul 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does npt.DTypeLike work?

(from @Dr-Irv): Yes, it does. In next commit.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is 260 commits behind master and if merged would cause a CI failure

pandas/core/arrays/period.py:346: error: unused "type: ignore" comment

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dr-Irv

is it worth posting the results of a test file with reveal_types for this?

np.sum((left._values != right._values).astype(int)) * 100.0 / len(left)
)
thesum = np.sum((left._values != right._values).astype(int))
diff = thesum * 100.0 / len(left)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must have had to change it in a previous commit with an earlier version of numpy. Reverted in next commit.

Comment on lines +127 to +128
NpDtype = Union[str, np.dtype, type_t[Union[str, float, int, complex, bool, object]]]
Dtype = Union["ExtensionDtype", NpDtype]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this PR need to go in before #41203?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they should be independent. Either can go first, and once one goes in, I can do the mypy check on the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use NpDtype = npt.DTypeLike to be consistent between numpy and pandas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use NpDtype = npt.DTypeLike to be consistent between numpy and pandas?

That's a big change, as it affects a lot more than just the astype() signature. The types are different:

>>> import numpy.typing as npt
>>> from pandas._typing import NpDtype
>>> NpDtype
typing.Union[str, numpy.dtype, typing.Type[typing.Union[str, float, int, complex, bool, object]]]
>>> npt.DTypeLike
typing.Union[numpy.dtype, NoneType, type, numpy.typing._dtype_like._SupportsDType[numpy.dtype], str, typing.Tuple[typing.Any, int], typing.Tuple[typing.Any, typing.Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], typing.List[typing.Any], numpy.typing._dtype_like._DTypeDict, typing.Tuple[typing.Any, typing.Any]]

@@ -392,7 +396,16 @@ def reconstruct(x):
def _coerce_to_array(self, value) -> tuple[np.ndarray, np.ndarray]:
return coerce_to_array(value)

def astype(self, dtype, copy: bool = True) -> ArrayLike:
@overload
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def astype(self, dtype: NpDtype, copy: bool = True) -> np.ndarray:
def astype(self, dtype: NpDtype, copy: bool = ...) -> np.ndarray:

and elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next commit

@@ -515,9 +517,17 @@ def nbytes(self) -> int:
# Additional Methods
# ------------------------------------------------------------------------

def astype(self, dtype, copy=True):
@overload
def astype(self, dtype: NpDtype, copy: bool = ...) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this use npt.DtypeLike?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this use npt.DtypeLike?

No. The following code is fine with mypy

import numpy.typing as npt

class Foo:
    pass

def myfun(x: npt.DTypeLike):
    pass

myfun(type(Foo))
myfun("abcd")
myfun(str)
myfun(object)
myfun("str")

That's because npt.DTypeLike accepts any type or any str. We want to separate out the ExtensionDtype that are passed as strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. Was able to get this to work by making astype() accept npt.DTypeLike. Created a new type AstypeArg to mean ExtensionDtype | npt.DTypeLike so that I don't have to change lots of other parts where NpDtype is used.

...

@overload
def astype(self, dtype: Dtype, copy: bool = ...) -> ArrayLike:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we pass a EA dtype object, we get a union return type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed in next commit

@@ -1291,7 +1290,7 @@ class PandasDtype(ExtensionDtype):

_metadata = ("_dtype",)

def __init__(self, dtype: NpDtype | PandasDtype | None):
def __init__(self, dtype: str | np.dtype | PandasDtype | None):
Copy link
Member

@simonjayhawkins simonjayhawkins Jul 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does npt.DTypeLike work?

(from @Dr-Irv): Yes, it does. In next commit.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 4, 2021

Thanks @Dr-Irv

is it worth posting the results of a test file with reveal_types for this?

Can you point me to an example?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 14, 2021

@simonjayhawkins ping...

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 26, 2021

pinging @simonjayhawkins

@jreback
Copy link
Contributor

jreback commented Jul 28, 2021

@Dr-Irv thanks for the patience. can you rebase

@simonjayhawkins if you can have a look

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Aug 3, 2021

weekly ping @simonjayhawkins

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Aug 23, 2021

another ping @simonjayhawkins

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 6, 2021

biweekly ping @simonjayhawkins

@jreback
Copy link
Contributor

jreback commented Sep 6, 2021

@Dr-Irv ok if you can merge master and make sure still passing (i think it was last) time, will merge these.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 6, 2021

@jreback all green except for a Windows timeout on one of the tests

@jreback jreback added this to the 1.4 milestone Sep 6, 2021
@jreback
Copy link
Contributor

jreback commented Sep 6, 2021

can you rebase again just to be sure (as merged the other)

cc @twoertwein

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 6, 2021

@jreback all green after merge with the delete/searchsorted changes

@jreback jreback merged commit 40274ac into pandas-dev:master Sep 6, 2021
@jreback
Copy link
Contributor

jreback commented Sep 6, 2021

thanks @Dr-Irv

@Dr-Irv Dr-Irv deleted the type_astype branch September 6, 2021 19:48
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants