-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Categorical.searchsorted(): use provided categorical order #14697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 84.57% (diff: 100%)@@ master #14697 diff @@
==========================================
Files 144 144
Lines 51057 51059 +2
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43180 43183 +3
+ Misses 7877 7876 -1
Partials 0 0
|
@@ -1548,50 +1548,47 @@ def test_memory_usage(self): | |||
|
|||
def test_searchsorted(self): | |||
# https://github.com/pandas-dev/pandas/issues/8420 | |||
s1 = pd.Series(['apple', 'bread', 'bread', 'cheese', 'milk']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check these for Series (as well as Categorical)
add a whatsnew note in 0.19.2 |
|
||
return self.codes.searchsorted(values_as_codes, sorter=sorter) | ||
if -1 in values_as_codes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this check after you search otherwise you end up scanning the data twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, could you explain it please?
I don't see how it's possible to avoid this check and additional data scanning, if we want exception to be thrown for values not from categories list.
searchsorted() returnes 0 for -1 code (absent category), as well as for smallest category value present in the list. Thus, searchsorted() does not allow to differentiate between the value from categories which should be put on 0th position and value not from categories (0 is also returned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see here: https://github.com/pandas-dev/pandas/blob/master/pandas/computation/pytables.py#L203. The idea IS to use searchsorted. Then check the 0's (only). If they are not actual categories, then you would raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't searching for 0's in the result of searchsorted() generally the same as searching for -1 among codes to be inserted?
can you rebase |
Previously, it used lexical order instead of the provided categorical order. Tests updated accordingly. Closes pandas-dev#14522
@@ -96,3 +96,5 @@ Bug Fixes | |||
- Bug in ``.plot(kind='kde')`` which did not drop missing values to generate the KDE Plot, instead generating an empty plot. (:issue:`14821`) | |||
|
|||
- Bug in ``unstack()`` if called with a list of column(s) as an argument, regardless of the dtypes of all columns, they get coerced to ``object`` (:issue:`11847`) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to 0.20.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and put under Other API changes
self.assert_numpy_array_equal(res, exp) | ||
self.assert_numpy_array_equal(res, chk) | ||
# https://github.com/pandas-dev/pandas/issues/14522 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a 1-line about what the guarantees are here
lgtm. minor doc-changes. pls also review categorical.rst if any changes are needed. |
thanks! |
Sorry I was unavailable for the last two weeks.. |
git diff upstream/master | flake8 --diff
Previously, it used lexical order instead of the provided categorical
order.
Tests updated accordingly.
Closes #14522
Documentation also needs to be updated.
Could you please review? If it's OK I'll append doc update to this pull request.
Thanks!