Skip to content

TYP: Typing for ExtensionArray.__getitem__ #41258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Sep 8, 2021
Merged

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented May 2, 2021

Changes for typing of ExtensionArray.__getitem__().

Required changes to DateTimeLikeArrayMixin for correct compatibility.

@simonjayhawkins simonjayhawkins added ExtensionArray Extending pandas with custom dtypes or arrays. Typing type annotations, mypy/pyright type checking labels May 2, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented May 16, 2021

@simonjayhawkins can you review, please?

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dr-Irv generally lgtm. issue with api regarding using Sequence that may require some discussion

@simonjayhawkins
Copy link
Member

Thanks @Dr-Irv generally lgtm. issue with api regarding using Sequence that may require some discussion

Thinking some more, probably best to remove Sequence altogether here.

A tuple of ints is a Sequence[int] and returns a Scalar on our 1d arrays in some cases and errors in others, so we need to add an extra overload for tuple to make the the annotations correct. This will then likely create LSP issues with the subclasses that can be backed by a 2d array as the return type of and tuple of ints will be inconsistent will the base implementation.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Irv can you also add annotations to the other array subclasses, Categorical, Interval, Masked, ArrowStringArray and Sparse before catering for the 2d array subclasses

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented May 23, 2021

@Dr-Irv can you also add annotations to the other array subclasses, Categorical, Interval, Masked, ArrowStringArray and Sparse before catering for the 2d array subclasses

Done in next commit. Created issues in the block and array managers where now the type ignore comments have to change, as we are strictly defining EA's to be 1D. So I changed those comments.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented May 30, 2021

@simonjayhawkins pushed changes a week ago, so looking forward to a new review.

@@ -204,6 +206,17 @@ def __setitem__(self, key, value):
def _validate_setitem_value(self, value):
return value

@overload
def __getitem__(self, key: int | np.integer) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR, but could/should we do a Generic think to tighten this Any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. Should I create an issue for discussion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't tihs a PositionScalar ?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 14, 2021

@simonjayhawkins ping...

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 26, 2021

pinging @simonjayhawkins

@@ -204,6 +206,17 @@ def __setitem__(self, key, value):
def _validate_setitem_value(self, value):
return value

@overload
def __getitem__(self, key: int | np.integer) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't tihs a PositionScalar ?

@overload
def __getitem__(
self: NDArrayBackedExtensionArrayT,
key: slice | np.ndarray | list[int] | PositionalIndexerTuple,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this PostionalIndexer2D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, there is no PositionScalar, so I'm not sure what you are referring to.

For the NDArrayBackedExtensionArray, which extends ExtensionArray, __getitem__() needs to handle the __getitem__() arguments for ExtensionArray, which is PositionalIndexer , and then NDArrayBackedExtensionArray can widen the accepted arguments. We then have

PositionalIndexer = Union[int, np.integer, slice, List[int], np.ndarray]
PositionalIndexerTuple = Tuple[PositionalIndexer, PositionalIndexer]
PositionalIndexer2D = Union[PositionalIndexer, PositionalIndexerTuple]

For NDArrayBackedExtensionArray.__getitem__(), it accepts PositionalIndexer2D. The overloads then indicate what are the return types for the various subtypes of PositionalIndexer2D. This has separate return types if the arguments are int | np.integer versus slice | np.ndarray | list[int] | PositionalIndexerTuple .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i had edited to PositionalIndexer2D which looks like the correct return type here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i had edited to PositionalIndexer2D which looks like the correct return type here

You mean "argument type", not "return type" ??

Here's the full code:

    @overload
    def __getitem__(self, key: int | np.integer) -> Any:
        ...

    @overload
    def __getitem__(
        self: NDArrayBackedExtensionArrayT,
        key: slice | np.ndarray | list[int] | PositionalIndexerTuple,
    ) -> NDArrayBackedExtensionArrayT:
        ...

    def __getitem__(
        self: NDArrayBackedExtensionArrayT,
        key: PositionalIndexer2D,
    ) -> NDArrayBackedExtensionArrayT | Any:

The way it works is as follows with respect to the type of key:

  • if the type is int | np.integer, return type is Any
  • if the type is slice | np.ndarray | list[int] | PositionalIndexerTuple, return type is definitely NDArrayBackedExtensionArrayT
  • If the type is PositionalIndexer, return type is either NDArrayBackedExtensionArrayT | Any

This makes some downstream typing work correctly by creating the separation this way.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Aug 3, 2021

weekly ping @simonjayhawkins

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Aug 23, 2021

another ping @simonjayhawkins

@jreback jreback added this to the 1.4 milestone Sep 6, 2021
@jreback
Copy link
Contributor

jreback commented Sep 6, 2021

same @Dr-Irv

cc @twoertwein if you can have a look as you are looking at similar things

PositionalIndexer2D = Union[
PositionalIndexer, Tuple[PositionalIndexer, PositionalIndexer]
]
# Using List[int] here rather than Sequence[int] to disallow tuples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you followup and give some hints on where these should be used (a.g. a line of description for each; example....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you followup and give some hints on where these should be used (a.g. a line of description for each; example....)

Done in a new commit - added comments - let me know if you want more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, looks good.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 7, 2021

@jreback all green now (except for the one test on Windows that always seems to time out)

@jreback jreback merged commit bc94e28 into pandas-dev:master Sep 8, 2021
@jreback
Copy link
Contributor

jreback commented Sep 8, 2021

thanks @Dr-Irv

@Dr-Irv Dr-Irv deleted the extgetitem branch September 8, 2021 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants