-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: ExtensionArray delete() and searchsorted() #41513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dr-Irv . now that the EA methods are being annotated individually, it may make sense to add the annotations to all EA classes for that method to ensure consistency. (e.g. SparseArray looks iffy def searchsorted(self, v, side="left", sorter=None):
@@ -1308,7 +1313,7 @@ def __hash__(self) -> int: | |||
# ------------------------------------------------------------------------ | |||
# Non-Optimized Default Methods | |||
|
|||
def delete(self: ExtensionArrayT, loc) -> ExtensionArrayT: | |||
def delete(self: ExtensionArrayT, loc: PositionalIndexer) -> ExtensionArrayT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the 'public' base EA class. it was added in #39405. @jbrockmendel is this method public? Is it part of the EA interface? does it need a docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch ill make a note to add a docstring
Next commit does this. Had to create some types to support the bridge from our |
pandas/core/base.py
Outdated
value: NumpyValueArrayLike, | ||
side: Literal["left", "right"] = "left", | ||
sorter: NumpySorter = None, | ||
) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can sometimes return an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
searchsorted
returns an np.ndarray
according to the numpy docs and the signature in numpy 1.20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like that doc/signature is inaccurate then
>>> arr = np.arange(5)
>>> type(arr.searchsorted(3))
<class 'numpy.int64'>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems to be an issue with numpy
. This is using numpy 1.20.2:
import numpy as np
a = np.array([1, 3, 5])
# reveal_type(a)
s = a.searchsorted(4)
# reveal_type(s)
print(s, type(s), isinstance(s, np.ndarray))
Result of running this is:
2 <class 'numpy.int64'> False
If I uncomment the reveal_type
lines, the types of a
and s
are both np.ndarray
.
So that gives us this question:
Should the pandas searchsorted
API match what numpy
documents (only array elements allowed), or what numpy
implements?
I've submitted a numpy issue here: numpy/numpy#19160
@simonjayhawkins @jbrockmendel pushed changes 5 days ago, so looking forward to a new review. |
pandas/core/arrays/_mixins.py
Outdated
return self._ndarray.searchsorted(value, side=side, sorter=sorter) | ||
def searchsorted( | ||
self, | ||
value: ArrayLike | object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object -> Scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would violate the Liskov substitution principle. Our searchsorted
needs to have the EA's reject an incompatible type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that would be bc you've typed EA.searchsorted with object, so why not change it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in EA
, we have to allow any type to be inside the EA
, including non-EA types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayLike is a subtype of object so not needed? Why does this not use the NumpyValueArrayLike
alias?
pandas/core/arrays/period.py
Outdated
value = self._validate_searchsorted_value(value).view("M8[ns]") | ||
def searchsorted( | ||
self, | ||
value: ArrayLike | object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think de-facto the scalar needs to be either Period or Period-castable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the ExtensionArray
type has to allow any object
, so the subtype does as well, and then it should raise if someone passes an incompatible object type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has to allow any object
aren't we supposed to use Any
for that?
so the subtype does as well, and then it should raise if someone passes an incompatible object type
so if a subclass is more restrictive in what it accepts (without raising), we can't have that be reflected in the annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has to allow any object
aren't we supposed to use
Any
for that?
If we use Any
, then that tells the type checker to ignore type checking on the result. If we use object
, then it will continue type checking.
so the subtype does as well, and then it should raise if someone passes an incompatible object type
so if a subclass is more restrictive in what it accepts (without raising), we can't have that be reflected in the annotation?
Yes and no. The full signature has to accept the same type as the parent, and can widen that argument type, but not narrow it. But, I think we can use an overload
signature to indicate that if a specific type is passed, it creates a result of that type. So the master signature could accept object
, and then you can have an overload
to correspond to the narrower type.
all the |
I think we should go ahead as I proposed it. The issue with |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@jbrockmendel Awaiting your response for 6 weeks to my last response here: #41513 (comment) |
You explained here why you are using
IIUC "has to" here is in order to satisfy Liskov. Is that right? Liskov is nice, but in this context it is conflicting with accuracy. e.g. in the PeriodArray case it is inaccurately suggesting that it will accept a timedelta. If we have to choose between Liskov and accuracy, my inclination is to choose accuracy. And if I'm right above that putting "object" is equivalent to leaving it unannotated, then this is introducing inaccuracy with no upside AFAICT. That said, this is not a hill I want to die on. I'll defer to @simonjayhawkins |
pinging @simonjayhawkins |
@Dr-Irv this looks pretty good to me, can you rebase and hopefully @simonjayhawkins can have a look |
weekly ping @simonjayhawkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you fix conflicts and a mypy error pandas/core/indexes/datetimes.py:772: error: Argument "side" to "get_slice_bound" of "Index" has incompatible type "str"; expected "Union[Literal['left'], Literal['right']]" [arg-type]
Thanks @Dr-Irv for being patient. In future maybe worth starting at the lower level functions and work up to the public api and maybe tackle a subset of the method arguments. Gradual typing facilitates this.
pandas/core/arrays/_mixins.py
Outdated
return self._ndarray.searchsorted(value, side=side, sorter=sorter) | ||
def searchsorted( | ||
self, | ||
value: ArrayLike | object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayLike is a subtype of object so not needed? Why does this not use the NumpyValueArrayLike
alias?
pandas/core/arrays/_mixins.py
Outdated
return self._ndarray.searchsorted(npvalue, side=side, sorter=sorter) | ||
|
||
def _validate_searchsorted_value( | ||
self, value: ArrayLike | object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, ArrayLike is a subtype of object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to now correspond better to the numpy signature. No longer refer to object
pandas/core/arrays/_mixins.py
Outdated
else: | ||
# While we also can support DatetimeLikeScalar, numpy.searchsorted | ||
# does not accept that as an argument | ||
return cast(PythonScalar, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are code changes here from master, this is a typing PR. Is this is fixing a bug? It looks like this prevents passing an EA to numpy? Does numpy currently raise a TypeError
or does it use the __array__ protocol? Does this need a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy complains when you pass an ExtensionArray to searchsorted
. E.g.:
import numpy as np
import pandas as pd
from pandas.core.arrays.base import ExtensionArray
from typing import cast
a = pd.array([1,3,4])
b = cast(ExtensionArray, a)
na = np.array([1,2,3,4,5])
r = na.searchsorted(b)
print(r)
produces from mypy:
irv.py:14: error: No overload variant of "searchsorted" of "ndarray" matches argument type "ExtensionArray" [call-overload]
irv.py:14: note: Possible overload variants:
irv.py:14: note: def searchsorted(self, v: Union[int, float, complex, str, bytes, generic], side: Union[Literal['left'], Literal['right']] = ..., sorter: Union[_SupportsArray[dtype[Union[bool_, integer[Any]]]], Sequence[_SupportsArray[dtype[Union[bool_, integer[Any]]]]], Sequence[Sequence[_SupportsArray[dtype[Union[bool_, integer[Any]]]]]], Sequence[Sequence[Sequence[_SupportsArray[dtype[Union[bool_, integer[Any]]]]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype[Union[bool_, integer[Any]]]]]]]], int, Sequence[int], Sequence[Sequence[int]], Sequence[Sequence[Sequence[int]]], Sequence[Sequence[Sequence[Sequence[int]]]], None] = ...) -> signedinteger[Any]
irv.py:14: note: def searchsorted(self, v: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], Union[Union[_SupportsArray[dtype[Any]], Sequence[_SupportsArray[dtype[Any]]], Sequence[Sequence[_SupportsArray[dtype[Any]]]], Sequence[Sequence[Sequence[_SupportsArray[dtype[Any]]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype[Any]]]]]]], Union[bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]]], side: Union[Literal['left'], Literal['right']] = ..., sorter: Union[_SupportsArray[dtype[Union[bool_, integer[Any]]]], Sequence[_SupportsArray[dtype[Union[bool_, integer[Any]]]]], Sequence[Sequence[_SupportsArray[dtype[Union[bool_, integer[Any]]]]]], Sequence[Sequence[Sequence[_SupportsArray[dtype[Union[bool_, integer[Any]]]]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype[Union[bool_, integer[Any]]]]]]]], int, Sequence[int], Sequence[Sequence[int]], Sequence[Sequence[Sequence[int]]], Sequence[Sequence[Sequence[Sequence[int]]]], None] = ...) -> ndarray[Any, dtype[signedinteger[Any]]]
So while mypy accepts the ExtensionArray
argument, the typing that is present within mypy does not.
pandas/core/arrays/_mixins.py
Outdated
# While we also can support DatetimeLikeScalar, numpy.searchsorted | ||
# does not accept that as an argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy will raise TypeError for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed in next commit.
pandas/core/arrays/_mixins.py
Outdated
if isinstance(value, ExtensionArray): | ||
return value.to_numpy() | ||
elif isinstance(value, np.ndarray): | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this condition and the following condition both return value
, the cast is a noop at runtime. can this be simplified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed in next commit
pandas/core/arrays/_mixins.py
Outdated
value: ArrayLike | object, | ||
side: Literal["left", "right"] = "left", | ||
sorter: NumpySorter = None, | ||
) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, np.intp
in next commit
pandas/core/arrays/base.py
Outdated
value: ArrayLike | object, | ||
side: Literal["left", "right"] = "left", | ||
sorter: NumpySorter = None, | ||
) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, np.intp
in next commit.
pandas/core/arrays/period.py
Outdated
npvalue = cast( | ||
np.ndarray, self._validate_searchsorted_value(value).view("M8[ns]") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid the cast with the typing of _validate_searchsorted_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in next commit.
pandas/core/base.py
Outdated
value: NumpyValueArrayLike, | ||
side: Literal["left", "right"] = "left", | ||
sorter: NumpySorter = None, | ||
) -> npt.NDArray[np.intp]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency can you update the return types of the other methods to also use the npt.NDArray[np.intp]
syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in next commit
pandas/core/computation/pytables.py
Outdated
# "Union[Any, ndarray[Any, Any]]", variable has type "int") | ||
result = metadata.searchsorted( | ||
v, side="left" | ||
) # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can avoid the ignore using a variable annotation for result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next commit
@simonjayhawkins Have just made a bunch of changes to use more numpy types in arguments and return types for One odd thing was that I put the overloads for I could put the same overloads throughout the code, but it would require the |
biweekly ping @simonjayhawkins |
@Dr-Irv ok if you can merge master and make sure still passing (i think it was last) time, will merge these. |
@jreback all green except for a Windows timeout on one of the tests |
thanks @Dr-Irv followup to consolidate any fixtures created here in other PRs |
Additional typing for
ExtensionArray
delete()
andsearchsorted()