BUG: Categorical.searchsorted(): use provided categorical order #14697

nathalier · 2016-11-20T00:47:52Z

closes Categorical.searchsorted() uses lexical order instead of the provided categorical order #14522
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Previously, it used lexical order instead of the provided categorical
order.

Tests updated accordingly.

Documentation also needs to be updated.
Could you please review? If it's OK I'll append doc update to this pull request.
Thanks!

codecov-io · 2016-11-20T10:28:58Z

Current coverage is 84.57% (diff: 100%)

Merging #14697 into master will increase coverage by <.01%

@@             master     #14697   diff @@
==========================================
  Files           144        144          
  Lines         51057      51059     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43180      43183     +3   
+ Misses         7877       7876     -1   
  Partials          0          0

Powered by Codecov. Last update f1cfe5b...86b42d0

jreback · 2016-11-21T11:37:03Z

pandas/tests/test_categorical.py

@@ -1548,50 +1548,47 @@ def test_memory_usage(self):

    def test_searchsorted(self):
        # https://github.com/pandas-dev/pandas/issues/8420
-        s1 = pd.Series(['apple', 'bread', 'bread', 'cheese', 'milk'])


check these for Series (as well as Categorical)

jreback · 2016-11-21T11:39:02Z

add a whatsnew note in 0.19.2

jreback · 2016-11-21T11:41:30Z

pandas/core/categorical.py


-        return self.codes.searchsorted(values_as_codes, sorter=sorter)
+        if -1 in values_as_codes:


do this check after you search otherwise you end up scanning the data twice

Sorry, could you explain it please?
I don't see how it's possible to avoid this check and additional data scanning, if we want exception to be thrown for values not from categories list.
searchsorted() returnes 0 for -1 code (absent category), as well as for smallest category value present in the list. Thus, searchsorted() does not allow to differentiate between the value from categories which should be put on 0th position and value not from categories (0 is also returned).

see here: https://github.com/pandas-dev/pandas/blob/master/pandas/computation/pytables.py#L203. The idea IS to use searchsorted. Then check the 0's (only). If they are not actual categories, then you would raise.

Isn't searching for 0's in the result of searchsorted() generally the same as searching for -1 among codes to be inserted?

jreback · 2016-12-16T23:51:12Z

can you rebase

Previously, it used lexical order instead of the provided categorical order. Tests updated accordingly. Closes pandas-dev#14522

jreback · 2016-12-20T17:34:32Z

doc/source/whatsnew/v0.19.2.txt

@@ -96,3 +96,5 @@ Bug Fixes
 - Bug in ``.plot(kind='kde')`` which did not drop missing values to generate the KDE Plot, instead generating an empty plot. (:issue:`14821`)

 - Bug in ``unstack()`` if called with a list of column(s) as an argument, regardless of the dtypes of all columns, they get coerced to ``object`` (:issue:`11847`)
+


can you move to 0.20.0

and put under Other API changes

jreback · 2016-12-20T17:35:30Z

pandas/tests/test_categorical.py

-        self.assert_numpy_array_equal(res, exp)
-        self.assert_numpy_array_equal(res, chk)
+        # https://github.com/pandas-dev/pandas/issues/14522
+


can you add a 1-line about what the guarantees are here

jreback · 2016-12-20T17:36:23Z

lgtm. minor doc-changes. pls also review categorical.rst if any changes are needed.

jreback · 2016-12-30T21:43:19Z

thanks!

nathalier · 2017-01-03T17:18:11Z

Sorry I was unavailable for the last two weeks..
Thank you!

nathalier force-pushed the gh-14522 branch from 3f58c03 to 4f11625 Compare November 20, 2016 10:28

jreback reviewed Nov 21, 2016

View reviewed changes

jreback added Bug Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 21, 2016

jreback reviewed Nov 21, 2016

View reviewed changes

BUG: Categorical.searchsorted(): use provided categorical order

86b42d0

Previously, it used lexical order instead of the provided categorical order. Tests updated accordingly. Closes pandas-dev#14522

nathalier force-pushed the gh-14522 branch from 4f11625 to 86b42d0 Compare December 18, 2016 23:24

jreback reviewed Dec 20, 2016

View reviewed changes

jreback added this to the 0.20.0 milestone Dec 20, 2016

jreback closed this in 0252385 Dec 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Categorical.searchsorted(): use provided categorical order #14697

BUG: Categorical.searchsorted(): use provided categorical order #14697

nathalier commented Nov 20, 2016 •

edited

Loading

codecov-io commented Nov 20, 2016 •

edited

Loading

jreback Nov 21, 2016

jreback commented Nov 21, 2016

jreback Nov 21, 2016

nathalier Nov 21, 2016

jreback Nov 21, 2016

nathalier Nov 21, 2016

jreback commented Dec 16, 2016

jreback Dec 20, 2016

jreback Dec 20, 2016

jreback Dec 20, 2016

jreback commented Dec 20, 2016

jreback commented Dec 30, 2016

nathalier commented Jan 3, 2017


		return self.codes.searchsorted(values_as_codes, sorter=sorter)
		if -1 in values_as_codes:

		@@ -96,3 +96,5 @@ Bug Fixes
		- Bug in ``.plot(kind='kde')`` which did not drop missing values to generate the KDE Plot, instead generating an empty plot. (:issue:`14821`)

		- Bug in ``unstack()`` if called with a list of column(s) as an argument, regardless of the dtypes of all columns, they get coerced to ``object`` (:issue:`11847`)

BUG: Categorical.searchsorted(): use provided categorical order #14697

BUG: Categorical.searchsorted(): use provided categorical order #14697

Conversation

nathalier commented Nov 20, 2016 • edited Loading

codecov-io commented Nov 20, 2016 • edited Loading

Current coverage is 84.57% (diff: 100%)

Choose a reason for hiding this comment

jreback commented Nov 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 20, 2016

jreback commented Dec 30, 2016

nathalier commented Jan 3, 2017

nathalier commented Nov 20, 2016 •

edited

Loading

codecov-io commented Nov 20, 2016 •

edited

Loading