Skip to content

Commit e616770

Browse files
author
tp
committed
make CategoricalIndex.__contains__ compatible with np<1.13
1 parent f856075 commit e616770

File tree

3 files changed

+22
-10
lines changed

3 files changed

+22
-10
lines changed

doc/source/whatsnew/v0.23.2.txt

+3-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@ Fixed Regressions
2424
Performance Improvements
2525
~~~~~~~~~~~~~~~~~~~~~~~~
2626

27-
-
27+
- Improved performance of membership checks in :class:`CategoricalIndex`
28+
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains`
29+
is likewise much faster (:issue:`21369`)
2830
-
2931

3032
Documentation Changes

doc/source/whatsnew/v0.24.0.txt

-3
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,6 @@ Performance Improvements
6565
~~~~~~~~~~~~~~~~~~~~~~~~
6666

6767
- Improved performance of :func:`Series.describe` in case of numeric dtpyes (:issue:`21274`)
68-
- Improved performance of membership checks in :class:`CategoricalIndex`
69-
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains`
70-
is likewise much faster (:issue:`21369`)
7168

7269
.. _whatsnew_0240.docs:
7370

pandas/core/indexes/category.py

+19-6
Original file line numberDiff line numberDiff line change
@@ -324,18 +324,31 @@ def _reverse_indexer(self):
324324
@Appender(_index_shared_docs['__contains__'] % _index_doc_kwargs)
325325
def __contains__(self, key):
326326
hash(key)
327-
if isna(key):
327+
328+
if isna(key): # is key NaN?
328329
return self.isna().any()
329-
elif self.categories._defer_to_indexing: # e.g. Interval values
330+
331+
# is key in self.categories? Then get its location.
332+
# If not (i.e. KeyError), it logically can't be in self either
333+
try:
330334
loc = self.categories.get_loc(key)
331-
return np.isin(self.codes, loc).any()
332-
elif key in self.categories:
333-
return self.categories.get_loc(key) in self._engine
334-
else:
335+
except KeyError:
335336
return False
336337

338+
# loc is the location of key in self.categories, but also the value
339+
# for key in self.codes and in self._engine. key may be in categories,
340+
# but still not in self, check this. Example:
341+
# 'b' in CategoricalIndex(['a'], categories=['a', 'b']) # False
342+
if is_scalar(loc):
343+
return loc in self._engine
344+
else:
345+
# if self.categories is IntervalIndex, loc is an array
346+
# check if any scalar of the array is in self._engine
347+
return any(loc_ in self._engine for loc_ in loc)
348+
337349
@Appender(_index_shared_docs['contains'] % _index_doc_kwargs)
338350
def contains(self, key):
351+
hash(key)
339352
return key in self
340353

341354
def __array__(self, dtype=None):

0 commit comments

Comments
 (0)