-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: ExtensionArray.searchsorted #24350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @TomAugspurger! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 28, 2018 at 19:54 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #24350 +/- ##
==========================================
+ Coverage 92.29% 92.29% +<.01%
==========================================
Files 162 162
Lines 51806 51816 +10
==========================================
+ Hits 47815 47825 +10
Misses 3991 3991
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24350 +/- ##
==========================================
- Coverage 92.3% 92.3% -0.01%
==========================================
Files 165 165
Lines 52176 52186 +10
==========================================
+ Hits 48161 48170 +9
- Misses 4015 4016 +1
Continue to review full report at Codecov.
|
pandas/core/arrays/base.py
Outdated
""" | ||
Find indices where elements should be inserted to maintain order. | ||
|
||
.. versionadded:: 0.25.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.24.0
pandas/core/arrays/base.py
Outdated
# 2. Values between the values in the `data_for_sorting` fixture | ||
# 3. Missing values. | ||
arr = self.astype(object) | ||
return arr.searchsorted(v, side=side, sorter=sorter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to astype to object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need an ndarray. I suppose we could do np.asarray(self)
, which will convert to the best possible ndarray? But that could be lossy and so you wouldn't get the correct answer.
So yes, I think we do need object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to be very inefficient, so subclasses would almost certainly need to override. I would rather add this as an abstract method then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to other methods. #24433
pandas/core/arrays/base.py
Outdated
@@ -505,6 +506,54 @@ def unique(self): | |||
uniques = unique(self.astype(object)) | |||
return self._from_sequence(uniques, dtype=self.dtype) | |||
|
|||
def searchsorted(self, v, side="left", sorter=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like one-letter parameter names. Use values
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like it either, but theres value in matching NumPy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, meant value
, as that is the name used in Series.searchsorted and various other searchsorted
implementations in pandas (Categorical.searchsorted
at least, haven’t checked all impl.).
I think it`s inconsistent to follow numpy naming here, when pandas’s is better:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use value
currently in both Index and Series. let's be consistent here (in fact I think we went thru a deprecation cycle on those a while back).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see Stephan's comment in
#24350 (comment)?
Ideally, np.searchsorted(extension_array)
, would always work. If we do np.searshsorted(v=extension_array)
I think a TypeError will be raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above; this would be an inconsistency in naming
which is much worse that have np.searchedsorted(ea) not working which is just convenience anyhow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just a convenience. @shoyer do you have thoughts here?
Subclasses implementing __array_function__
will be allowed to override ExtensionArray.searchsorted with the correct function signature, but it'd be nice if things worked out of the box.
Hmm, I'm not sure what to do then. ExtensionArrays are more at the level of
an ndarray than a Series.
@shoyer, if I were writing an ExtensionArray that also wanted to implement
`__array_function__`, is it OK for the name of positional arguments to
differ?
…On Thu, Dec 20, 2018 at 12:05 PM topper-123 ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/arrays/base.py
<#24350 (comment)>:
> @@ -505,6 +506,54 @@ def unique(self):
uniques = unique(self.astype(object))
return self._from_sequence(uniques, dtype=self.dtype)
+ def searchsorted(self, v, side="left", sorter=None):
Sorry, meant value, as that is the name used in Series.searchsorted and
various other searchsorted implementations in pandas (
Categorical.searchsorted at least, haven’t checked all impl.).
I think it`s inconsistent to follow numpy naming here, when pandas’s is
better:-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24350 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIsqBJDEwwMDdqbdFzQEupGtUfVF7ks5u69GFgaJpZM4ZaO7Y>
.
|
|
Thanks. Given that, it's probably best to follow NumPy's signature exactly
wherever possible.
…On Thu, Dec 20, 2018 at 1:01 PM Stephan Hoyer ***@***.***> wrote:
if I were writing an ExtensionArray that also wanted to implement
__array_function__, is it OK for the name of positional arguments to
differ?
__array_function__ passes on the exact positional and keyword argument
from how the function is called. So in practice this would mean that anyone
who uses keyword arguments with NumPy's names would get a TypeError, unless
you add keyword-only arguments for NumPy's names, too. Using positional
arguments would be fine, though.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24350 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIk9xLOo3D-VHj7MD8IcGKlWNfp31ks5u695ygaJpZM4ZaO7Y>
.
|
pandas/core/arrays/base.py
Outdated
@@ -505,6 +506,54 @@ def unique(self): | |||
uniques = unique(self.astype(object)) | |||
return self._from_sequence(uniques, dtype=self.dtype) | |||
|
|||
def searchsorted(self, v, side="left", sorter=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use value
currently in both Index and Series. let's be consistent here (in fact I think we went thru a deprecation cycle on those a while back).
pandas/core/arrays/base.py
Outdated
# 2. Values between the values in the `data_for_sorting` fixture | ||
# 3. Missing values. | ||
arr = self.astype(object) | ||
return arr.searchsorted(v, side=side, sorter=sorter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to be very inefficient, so subclasses would almost certainly need to override. I would rather add this as an abstract method then.
Changed |
# 2. Values between the values in the `data_for_sorting` fixture | ||
# 3. Missing values. | ||
arr = self.astype(object) | ||
return arr.searchsorted(value, side=side, sorter=sorter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC you have an issue to show a warning here for EA's that don't redefined this I think?
thanks @TomAugspurger |
Yes (for developers). I'm hoping to do that later today.
…On Fri, Dec 28, 2018 at 3:15 PM Jeff Reback ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/arrays/base.py
<#24350 (comment)>:
> + Returns
+ -------
+ indices : array of ints
+ Array of insertion points with the same shape as `value`.
+
+ See Also
+ --------
+ numpy.searchsorted : Similar method from NumPy.
+ """
+ # Note: the base tests provided by pandas only test the basics.
+ # We do not test
+ # 1. Values outside the range of the `data_for_sorting` fixture
+ # 2. Values between the values in the `data_for_sorting` fixture
+ # 3. Missing values.
+ arr = self.astype(object)
+ return arr.searchsorted(value, side=side, sorter=sorter)
IIRC you have an issue to show a warning here for EA's that don't
redefined this I think?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24350 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHInoAr2DXOO0ZNBQDm03hhR6q3aSrks5u9onegaJpZM4ZaO7Y>
.
|
No description provided.