Skip to content

TYP: Typing for ExtensionArray.__getitem__ #41258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Sep 8, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
6c83511
TYP: ExtensionArray.__getitem__
Dr-Irv May 1, 2021
0c8d648
make base class use PositionalIndexer2D
Dr-Irv May 2, 2021
c43eadc
merge with master
Dr-Irv May 16, 2021
94ced76
fix up getitem typing for DateTimeOps
Dr-Irv May 16, 2021
4772f8e
Merge remote-tracking branch 'upstream/master' into extgetitem
Dr-Irv May 23, 2021
2773c8b
Make getitem on EA accept 1D, and change declaration for 2D arrays
Dr-Irv May 23, 2021
b7f2485
Merge branch 'master' into extgetitem
Dr-Irv May 31, 2021
01c0cf5
casting in datetimelike, allow NA in string arrow
Dr-Irv May 31, 2021
3b38de2
fix string arrow NA type
Dr-Irv May 31, 2021
b25d5a3
Merge remote-tracking branch 'upstream/master' into extgetitem
Dr-Irv Jun 13, 2021
d7c545d
change an overload in mixins to use NDArrayBackedExtensionArrayT
Dr-Irv Jun 13, 2021
076e434
categorical returns Any, interval for NA, put back libmissing in stri…
Dr-Irv Jun 14, 2021
01c1a3f
Merge remote-tracking branch 'upstream/master' into extgetitem
Dr-Irv Jul 6, 2021
738ec89
merge with master
Dr-Irv Jul 8, 2021
c5e300c
Merge remote-tracking branch 'upstream/master' into extgetitem
Dr-Irv Jul 8, 2021
1dbb668
change ignore messages
Dr-Irv Jul 8, 2021
3e19841
Merge remote-tracking branch 'upstream/master' into extgetitem
Dr-Irv Jul 14, 2021
73680ac
WIP: merge with master
Dr-Irv Jul 26, 2021
adf3a73
resolve conflicts in core/internals/blocks.py
Dr-Irv Jul 26, 2021
6068976
Merge remote-tracking branch 'upstream/master' into extgetitem
Dr-Irv Jul 28, 2021
0f36e5f
merge with master 0803
Dr-Irv Aug 3, 2021
9557cac
merge with master
Dr-Irv Sep 6, 2021
9a8550d
create types for split of getitem arguments
Dr-Irv Sep 6, 2021
10c454d
merge in delete/searchsorted typing changes
Dr-Irv Sep 6, 2021
a5318bc
merge with astype changes
Dr-Irv Sep 6, 2021
b941732
comments on various indexers in _typing.py
Dr-Irv Sep 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions pandas/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,6 @@
# TODO: add Ellipsis, see
# https://github.com/python/typing/issues/684#issuecomment-548203158
# https://bugs.python.org/issue41810
PositionalIndexer = Union[int, np.integer, slice, Sequence[int], np.ndarray]
PositionalIndexer2D = Union[
PositionalIndexer, Tuple[PositionalIndexer, PositionalIndexer]
]
PositionalIndexer = Union[int, np.integer, slice, List[int], np.ndarray]
PositionalIndexerTuple = Tuple[PositionalIndexer, PositionalIndexer]
PositionalIndexer2D = Union[PositionalIndexer, PositionalIndexerTuple]
13 changes: 13 additions & 0 deletions pandas/core/arrays/_mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
Sequence,
TypeVar,
cast,
overload,
)

import numpy as np
Expand All @@ -15,6 +16,7 @@
from pandas._typing import (
F,
PositionalIndexer2D,
PositionalIndexerTuple,
Shape,
type_t,
)
Expand Down Expand Up @@ -204,6 +206,17 @@ def __setitem__(self, key, value):
def _validate_setitem_value(self, value):
return value

@overload
def __getitem__(self, key: int | np.integer) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR, but could/should we do a Generic think to tighten this Any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. Should I create an issue for discussion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't tihs a PositionScalar ?

...

@overload
def __getitem__(
self: NDArrayBackedExtensionArray,
key: slice | np.ndarray | list[int] | PositionalIndexerTuple,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this PostionalIndexer2D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, there is no PositionScalar, so I'm not sure what you are referring to.

For the NDArrayBackedExtensionArray, which extends ExtensionArray, __getitem__() needs to handle the __getitem__() arguments for ExtensionArray, which is PositionalIndexer , and then NDArrayBackedExtensionArray can widen the accepted arguments. We then have

PositionalIndexer = Union[int, np.integer, slice, List[int], np.ndarray]
PositionalIndexerTuple = Tuple[PositionalIndexer, PositionalIndexer]
PositionalIndexer2D = Union[PositionalIndexer, PositionalIndexerTuple]

For NDArrayBackedExtensionArray.__getitem__(), it accepts PositionalIndexer2D. The overloads then indicate what are the return types for the various subtypes of PositionalIndexer2D. This has separate return types if the arguments are int | np.integer versus slice | np.ndarray | list[int] | PositionalIndexerTuple .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i had edited to PositionalIndexer2D which looks like the correct return type here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i had edited to PositionalIndexer2D which looks like the correct return type here

You mean "argument type", not "return type" ??

Here's the full code:

    @overload
    def __getitem__(self, key: int | np.integer) -> Any:
        ...

    @overload
    def __getitem__(
        self: NDArrayBackedExtensionArrayT,
        key: slice | np.ndarray | list[int] | PositionalIndexerTuple,
    ) -> NDArrayBackedExtensionArrayT:
        ...

    def __getitem__(
        self: NDArrayBackedExtensionArrayT,
        key: PositionalIndexer2D,
    ) -> NDArrayBackedExtensionArrayT | Any:

The way it works is as follows with respect to the type of key:

  • if the type is int | np.integer, return type is Any
  • if the type is slice | np.ndarray | list[int] | PositionalIndexerTuple, return type is definitely NDArrayBackedExtensionArrayT
  • If the type is PositionalIndexer, return type is either NDArrayBackedExtensionArrayT | Any

This makes some downstream typing work correctly by creating the separation this way.

) -> NDArrayBackedExtensionArray:
...

def __getitem__(
self: NDArrayBackedExtensionArrayT,
key: PositionalIndexer2D,
Expand Down
18 changes: 16 additions & 2 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
Sequence,
TypeVar,
cast,
overload,
)

import numpy as np
Expand Down Expand Up @@ -291,8 +292,19 @@ def _from_factorized(cls, values, original):
# ------------------------------------------------------------------------
# Must be a Sequence
# ------------------------------------------------------------------------
@overload
def __getitem__(self, item: int | np.integer) -> Any:
...

def __getitem__(self, item: PositionalIndexer) -> ExtensionArray | Any:
@overload
def __getitem__(
self: ExtensionArrayT, item: slice | np.ndarray | list[int]
) -> ExtensionArrayT:
...

def __getitem__(
self: ExtensionArrayT, item: PositionalIndexer
) -> ExtensionArrayT | Any:
"""
Select a subset of self.

Expand All @@ -306,6 +318,8 @@ def __getitem__(self, item: PositionalIndexer) -> ExtensionArray | Any:

* ndarray: A 1-d boolean NumPy ndarray the same length as 'self'

* list[int]: A list of int

Returns
-------
item : scalar or ExtensionArray
Expand Down Expand Up @@ -736,7 +750,7 @@ def fillna(
new_values = self.copy()
return new_values

def dropna(self):
def dropna(self: ExtensionArrayT) -> ExtensionArrayT:
"""
Return ExtensionArray without NA values.

Expand Down
18 changes: 17 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
TypeVar,
Union,
cast,
overload,
)
from warnings import (
catch_warnings,
Expand All @@ -34,6 +35,8 @@
Dtype,
NpDtype,
Ordered,
PositionalIndexer2D,
PositionalIndexerTuple,
Scalar,
Shape,
type_t,
Expand Down Expand Up @@ -2015,7 +2018,20 @@ def __repr__(self) -> str:

# ------------------------------------------------------------------

def __getitem__(self, key):
@overload
def __getitem__(self, key: int | np.integer) -> object:
...

@overload
def __getitem__(
self: CategoricalT,
key: slice | np.ndarray | list[int] | PositionalIndexerTuple,
) -> CategoricalT:
...

def __getitem__(
self: CategoricalT, key: PositionalIndexer2D
) -> CategoricalT | object:
"""
Return an item.
"""
Expand Down
25 changes: 17 additions & 8 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
DtypeObj,
NpDtype,
PositionalIndexer2D,
PositionalIndexerTuple,
)
from pandas.compat.numpy import function as nv
from pandas.errors import (
Expand Down Expand Up @@ -308,14 +309,26 @@ def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
return np.array(list(self), dtype=object)
return self._ndarray

@overload
def __getitem__(self, item: int | np.integer) -> DTScalarOrNaT:
...

@overload
def __getitem__(
self: DatetimeLikeArrayT,
item: slice | np.ndarray | list[int] | PositionalIndexerTuple,
) -> DatetimeLikeArrayT:
...

def __getitem__(
self, key: PositionalIndexer2D
) -> DatetimeLikeArrayMixin | DTScalarOrNaT:
self: DatetimeLikeArrayT, key: PositionalIndexer2D
) -> DatetimeLikeArrayT | DTScalarOrNaT:
"""
This getitem defers to the underlying array, which by-definition can
only handle list-likes, slices, and integer scalars
"""
result = super().__getitem__(key)
# Use cast as we know we will get back a DatetimeLikeArray
result = cast(DatetimeLikeArrayT, super().__getitem__(key))
if lib.is_scalar(result):
return result

Expand Down Expand Up @@ -1787,11 +1800,7 @@ def factorize(self, na_sentinel=-1, sort: bool = False):
uniques = self.copy() # TODO: copy or view?
if sort and self.freq.n < 0:
codes = codes[::-1]
# TODO: overload __getitem__, a slice indexer returns same type as self
# error: Incompatible types in assignment (expression has type
# "Union[DatetimeLikeArrayMixin, Union[Any, Any]]", variable
# has type "TimelikeOps")
uniques = uniques[::-1] # type: ignore[assignment]
uniques = uniques[::-1]
return codes, uniques
# FIXME: shouldn't get here; we are ignoring sort
return super().factorize(na_sentinel=na_sentinel)
Expand Down
7 changes: 2 additions & 5 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
)
from typing import (
TYPE_CHECKING,
cast,
overload,
)
import warnings
Expand Down Expand Up @@ -476,11 +475,9 @@ def _generate_range(
index = cls._simple_new(arr, freq=None, dtype=dtype)

if not left_closed and len(index) and index[0] == start:
# TODO: overload DatetimeLikeArrayMixin.__getitem__
index = cast(DatetimeArray, index[1:])
index = index[1:]
if not right_closed and len(index) and index[-1] == end:
# TODO: overload DatetimeLikeArrayMixin.__getitem__
index = cast(DatetimeArray, index[:-1])
index = index[:-1]

dtype = tz_to_dtype(tz)
return cls._simple_new(index._ndarray, freq=freq, dtype=dtype)
Expand Down
16 changes: 15 additions & 1 deletion pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
Sequence,
TypeVar,
cast,
overload,
)

import numpy as np
Expand All @@ -28,6 +29,7 @@
ArrayLike,
Dtype,
NpDtype,
PositionalIndexer,
)
from pandas.compat.numpy import function as nv
from pandas.util._decorators import Appender
Expand Down Expand Up @@ -629,7 +631,19 @@ def __iter__(self):
def __len__(self) -> int:
return len(self._left)

def __getitem__(self, key):
@overload
def __getitem__(self, key: int | np.integer) -> Interval:
...

@overload
def __getitem__(
self: IntervalArrayT, key: slice | np.ndarray | list[int]
) -> IntervalArrayT:
...

def __getitem__(
self: IntervalArrayT, key: PositionalIndexer
) -> IntervalArrayT | Interval:
key = check_array_indexer(self, key)
left = self._left[key]
right = self._right[key]
Expand Down
15 changes: 14 additions & 1 deletion pandas/core/arrays/masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
Any,
Sequence,
TypeVar,
overload,
)

import numpy as np
Expand Down Expand Up @@ -135,7 +136,19 @@ def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
def dtype(self) -> BaseMaskedDtype:
raise AbstractMethodError(self)

def __getitem__(self, item: PositionalIndexer) -> BaseMaskedArray | Any:
@overload
def __getitem__(self, item: int | np.integer) -> Any:
...

@overload
def __getitem__(
self: BaseMaskedArrayT, item: slice | np.ndarray | list[int]
) -> BaseMaskedArrayT:
...

def __getitem__(
self: BaseMaskedArrayT, item: PositionalIndexer
) -> BaseMaskedArrayT | Any:
if is_integer(item):
if self._mask[item]:
return self.dtype.na_value
Expand Down
35 changes: 33 additions & 2 deletions pandas/core/arrays/sparse/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,13 @@
import numbers
import operator
from typing import (
TYPE_CHECKING,
Any,
Callable,
Sequence,
TypeVar,
cast,
overload,
)
import warnings

Expand All @@ -27,6 +30,7 @@
from pandas._typing import (
Dtype,
NpDtype,
PositionalIndexer,
Scalar,
)
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -77,6 +81,17 @@

import pandas.io.formats.printing as printing

# See https://github.com/python/typing/issues/684
if TYPE_CHECKING:
from enum import Enum

class ellipsis(Enum):
Ellipsis = "..."

Ellipsis = ellipsis.Ellipsis
else:
ellipsis = type(Ellipsis)

# ----------------------------------------------------------------------------
# Array

Expand Down Expand Up @@ -810,8 +825,21 @@ def value_counts(self, dropna: bool = True):
# --------
# Indexing
# --------
@overload
def __getitem__(self, key: int | np.integer) -> Any:
...

@overload
def __getitem__(
self: SparseArrayT,
key: slice | np.ndarray | list[int] | tuple[int | ellipsis, ...],
) -> SparseArrayT:
...

def __getitem__(self, key):
def __getitem__(
self: SparseArrayT,
key: PositionalIndexer | tuple[int | ellipsis, ...],
) -> SparseArrayT | Any:

if isinstance(key, tuple):
if len(key) > 1:
Expand All @@ -821,6 +849,8 @@ def __getitem__(self, key):
key = key[:-1]
if len(key) > 1:
raise IndexError("too many indices for array.")
if key[0] is Ellipsis:
raise ValueError("Cannot slice with Ellipsis")
key = key[0]

if is_integer(key):
Expand Down Expand Up @@ -849,7 +879,8 @@ def __getitem__(self, key):
key = check_array_indexer(self, key)

if com.is_bool_indexer(key):

# mypy doesn't know we have an array here
key = cast(np.ndarray, key)
return self.take(np.arange(len(key), dtype=np.int32)[key])
elif hasattr(key, "__len__"):
return self.take(key)
Expand Down
15 changes: 14 additions & 1 deletion pandas/core/arrays/string_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
Any,
Sequence,
cast,
overload,
)

import numpy as np
Expand Down Expand Up @@ -341,7 +342,19 @@ def _concat_same_type(cls, to_concat) -> ArrowStringArray:
)
)

def __getitem__(self, item: PositionalIndexer) -> Any:
@overload
def __getitem__(self, item: int | np.integer) -> str:
...

@overload
def __getitem__(
self: ArrowStringArray, item: slice | np.ndarray | list[int]
) -> ArrowStringArray:
...

def __getitem__(
self: ArrowStringArray, item: PositionalIndexer
) -> ArrowStringArray | str:
"""Select a subset of self.

Parameters
Expand Down
Loading