Skip to content

Bug in CategoricalIndex[Interval].__contains__ for non-interval data #23705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Nov 14, 2018 · 3 comments · Fixed by #27100
Closed

Bug in CategoricalIndex[Interval].__contains__ for non-interval data #23705

TomAugspurger opened this issue Nov 14, 2018 · 3 comments · Fixed by #27100
Labels
Categorical Categorical Data Type Index Related to the Index class or subclasses Interval Interval data type
Milestone

Comments

@TomAugspurger
Copy link
Contributor

In [10]: idx = pd.CategoricalIndex(pd.IntervalIndex.from_breaks([1, 2, 3]))

In [11]: 'a' in idx
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-50c294b3bf00> in <module>()
----> 1 'a' in idx

~/sandbox/pandas/pandas/core/indexes/category.py in __contains__(self, key)
    358             return self.hasnans
    359
--> 360         return contains(self, key, container=self._engine)
    361
    362     @Appender(_index_shared_docs['contains'] % _index_doc_kwargs)

~/sandbox/pandas/pandas/core/arrays/categorical.py in contains(cat, key, container)
    200     #  can't be in container either.
    201     try:
--> 202         loc = cat.categories.get_loc(key)
    203     except KeyError:
    204         return False

~/sandbox/pandas/pandas/core/indexes/interval.py in get_loc(self, key, method)
    716                 key = self._maybe_cast_slice_bound(key, 'left', None)
    717
--> 718             start, stop = self._find_non_overlapping_monotonic_bounds(key)
    719
    720             if start is None or stop is None:

~/sandbox/pandas/pandas/core/indexes/interval.py in _find_non_overlapping_monotonic_bounds(self, key)
    664
    665             start = self._searchsorted_monotonic(key, 'left')
--> 666             stop = self._searchsorted_monotonic(key, 'right')
    667         return start, stop
    668

~/sandbox/pandas/pandas/core/indexes/interval.py in _searchsorted_monotonic(self, label, side, exclude_label)
    625             sub_idx = self.left
    626             if self.open_left or exclude_label:
--> 627                 label = _get_prev_label(label)
    628
    629         return sub_idx._searchsorted_monotonic(label, side)

~/sandbox/pandas/pandas/core/indexes/interval.py in _get_prev_label(label)
     90     else:
     91         raise TypeError('cannot determine next label for type {typ!r}'
---> 92                         .format(typ=type(label)))
     93
     94

TypeError: cannot determine next label for type <class 'str'>

cc @jschendel

@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Categorical Categorical Data Type Interval Interval data type labels Nov 14, 2018
@jschendel
Copy link
Member

jschendel commented Nov 14, 2018

Aside from the example you posted returning False, would you expect this operation to return True for exact matches only, or for contained matches too?

Specifically, this is the following behavior on master:

In [2]: ci = pd.CategoricalIndex(pd.IntervalIndex.from_breaks([1, 2, 3]))

In [3]: 1.5 in ci
Out[3]: True

In [4]: pd.Interval(1.4, 1.6) in ci
Out[4]: True

Since these checks are going through get_loc, [4] will return False in the future when the new indexing specs are implemented for IntervalIndex, though [3] will still be True. What is your expected output for [3] and [4]?

@TomAugspurger
Copy link
Contributor Author

Hmm didn't think that far ahead :) I honestly am not sure. 1.5 clearly isn't in the categories, so in that sense I'd expect False. But on the other hand, should CategroicalIndex.__contains__ simply dispatch to .categories.__contains__?

@jschendel
Copy link
Member

should CategroicalIndex.__contains__ simply dispatch to .categories.__contains__?

This makes sense to me in terms of how IntervalIndex works. Note that with this implementation [4] is still True on master, but will be False after the new behavior spec. I'd expect this to be a good solution for other types of indexes as well, but I could be missing something there.

@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 29, 2019
@jreback jreback added this to the 0.25.0 milestone Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Index Related to the Index class or subclasses Interval Interval data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants