-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Typing for ExtensionArray.__getitem__ #41258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@simonjayhawkins can you review, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dr-Irv generally lgtm. issue with api regarding using Sequence that may require some discussion
Thinking some more, probably best to remove Sequence altogether here. A tuple of ints is a Sequence[int] and returns a Scalar on our 1d arrays in some cases and errors in others, so we need to add an extra overload for tuple to make the the annotations correct. This will then likely create LSP issues with the subclasses that can be backed by a 2d array as the return type of and tuple of ints will be inconsistent will the base implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dr-Irv can you also add annotations to the other array subclasses, Categorical, Interval, Masked, ArrowStringArray and Sparse before catering for the 2d array subclasses
Done in next commit. Created issues in the block and array managers where now the |
@simonjayhawkins pushed changes a week ago, so looking forward to a new review. |
pandas/core/arrays/_mixins.py
Outdated
@@ -204,6 +206,17 @@ def __setitem__(self, key, value): | |||
def _validate_setitem_value(self, value): | |||
return value | |||
|
|||
@overload | |||
def __getitem__(self, key: int | np.integer) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, but could/should we do a Generic think to tighten this Any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. Should I create an issue for discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't tihs a PositionScalar ?
@simonjayhawkins ping... |
pinging @simonjayhawkins |
pandas/core/arrays/_mixins.py
Outdated
@@ -204,6 +206,17 @@ def __setitem__(self, key, value): | |||
def _validate_setitem_value(self, value): | |||
return value | |||
|
|||
@overload | |||
def __getitem__(self, key: int | np.integer) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't tihs a PositionScalar ?
pandas/core/arrays/_mixins.py
Outdated
@overload | ||
def __getitem__( | ||
self: NDArrayBackedExtensionArrayT, | ||
key: slice | np.ndarray | list[int] | PositionalIndexerTuple, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this PostionalIndexer2D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, there is no PositionScalar
, so I'm not sure what you are referring to.
For the NDArrayBackedExtensionArray
, which extends ExtensionArray
, __getitem__()
needs to handle the __getitem__()
arguments for ExtensionArray
, which is PositionalIndexer
, and then NDArrayBackedExtensionArray
can widen the accepted arguments. We then have
PositionalIndexer = Union[int, np.integer, slice, List[int], np.ndarray]
PositionalIndexerTuple = Tuple[PositionalIndexer, PositionalIndexer]
PositionalIndexer2D = Union[PositionalIndexer, PositionalIndexerTuple]
For NDArrayBackedExtensionArray.__getitem__()
, it accepts PositionalIndexer2D
. The overloads then indicate what are the return types for the various subtypes of PositionalIndexer2D
. This has separate return types if the arguments are int | np.integer
versus slice | np.ndarray | list[int] | PositionalIndexerTuple
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i had edited to PositionalIndexer2D
which looks like the correct return type here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i had edited to
PositionalIndexer2D
which looks like the correct return type here
You mean "argument type", not "return type" ??
Here's the full code:
@overload
def __getitem__(self, key: int | np.integer) -> Any:
...
@overload
def __getitem__(
self: NDArrayBackedExtensionArrayT,
key: slice | np.ndarray | list[int] | PositionalIndexerTuple,
) -> NDArrayBackedExtensionArrayT:
...
def __getitem__(
self: NDArrayBackedExtensionArrayT,
key: PositionalIndexer2D,
) -> NDArrayBackedExtensionArrayT | Any:
The way it works is as follows with respect to the type of key
:
- if the type is
int | np.integer
, return type isAny
- if the type is
slice | np.ndarray | list[int] | PositionalIndexerTuple
, return type is definitelyNDArrayBackedExtensionArrayT
- If the type is
PositionalIndexer
, return type is eitherNDArrayBackedExtensionArrayT | Any
This makes some downstream typing work correctly by creating the separation this way.
weekly ping @simonjayhawkins |
another ping @simonjayhawkins |
same @Dr-Irv cc @twoertwein if you can have a look as you are looking at similar things |
PositionalIndexer2D = Union[ | ||
PositionalIndexer, Tuple[PositionalIndexer, PositionalIndexer] | ||
] | ||
# Using List[int] here rather than Sequence[int] to disallow tuples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you followup and give some hints on where these should be used (a.g. a line of description for each; example....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you followup and give some hints on where these should be used (a.g. a line of description for each; example....)
Done in a new commit - added comments - let me know if you want more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, looks good.
@jreback all green now (except for the one test on Windows that always seems to time out) |
thanks @Dr-Irv |
Changes for typing of
ExtensionArray.__getitem__()
.Required changes to
DateTimeLikeArrayMixin
for correct compatibility.