-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: ExtensionArray.searchsorted #24350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
ac94414
58418ab
fee9c1a
ff8bbc3
e6adcd9
a91fcec
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,6 +64,7 @@ class ExtensionArray(object): | |
* unique | ||
* factorize / _values_for_factorize | ||
* argsort / _values_for_argsort | ||
* searchsorted | ||
|
||
The remaining methods implemented on this class should be performant, | ||
as they only compose abstract methods. Still, a more efficient | ||
|
@@ -505,6 +506,54 @@ def unique(self): | |
uniques = unique(self.astype(object)) | ||
return self._from_sequence(uniques, dtype=self.dtype) | ||
|
||
def searchsorted(self, v, side="left", sorter=None): | ||
""" | ||
Find indices where elements should be inserted to maintain order. | ||
|
||
.. versionadded:: 0.25.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 0.24.0 |
||
|
||
Find the indices into a sorted array `self` (a) such that, if the | ||
corresponding elements in `v` were inserted before the indices, the | ||
order of `self` would be preserved. | ||
|
||
Assuming that `a` is sorted: | ||
|
||
====== ============================ | ||
`side` returned index `i` satisfies | ||
====== ============================ | ||
left ``self[i-1] < v <= self[i]`` | ||
right ``self[i-1] <= v < self[i]`` | ||
====== ============================ | ||
|
||
Parameters | ||
---------- | ||
v : array_like | ||
Values to insert into `self`. | ||
side : {'left', 'right'}, optional | ||
If 'left', the index of the first suitable location found is given. | ||
If 'right', return the last such index. If there is no suitable | ||
index, return either 0 or N (where N is the length of `self`). | ||
sorter : 1-D array_like, optional | ||
Optional array of integer indices that sort array a into ascending | ||
order. They are typically the result of argsort. | ||
|
||
Returns | ||
------- | ||
indices : array of ints | ||
Array of insertion points with the same shape as `v`. | ||
|
||
See Also | ||
-------- | ||
numpy.searchsorted : Similar method from NumPy. | ||
""" | ||
# Note: the base tests provided by pandas only test the basics. | ||
# We do not test | ||
# 1. Values outside the range of the `data_for_sorting` fixture | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# 2. Values between the values in the `data_for_sorting` fixture | ||
# 3. Missing values. | ||
arr = self.astype(object) | ||
return arr.searchsorted(v, side=side, sorter=sorter) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to astype to object? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need an ndarray. I suppose we could do So yes, I think we do need object. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is going to be very inefficient, so subclasses would almost certainly need to override. I would rather add this as an abstract method then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is similar to other methods. #24433 |
||
|
||
def _values_for_factorize(self): | ||
# type: () -> Tuple[ndarray, Any] | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like one-letter parameter names. Use
values
instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like it either, but theres value in matching NumPy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, meant
value
, as that is the name used in Series.searchsorted and various othersearchsorted
implementations in pandas (Categorical.searchsorted
at least, haven’t checked all impl.).I think it`s inconsistent to follow numpy naming here, when pandas’s is better:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use
value
currently in both Index and Series. let's be consistent here (in fact I think we went thru a deprecation cycle on those a while back).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see Stephan's comment in
#24350 (comment)?
Ideally,
np.searchsorted(extension_array)
, would always work. If we donp.searshsorted(v=extension_array)
I think a TypeError will be raised.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above; this would be an inconsistency in naming
which is much worse that have np.searchedsorted(ea) not working which is just convenience anyhow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just a convenience. @shoyer do you have thoughts here?
Subclasses implementing
__array_function__
will be allowed to override ExtensionArray.searchsorted with the correct function signature, but it'd be nice if things worked out of the box.