Skip to content

BUG: CategoricalIndex.get_loc(np.nan) #41933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1014,6 +1014,7 @@ Indexing
- Bug in :meth:`DataFrame.loc.__setitem__` when setting-with-expansion incorrectly raising when the index in the expanding axis contained duplicates (:issue:`40096`)
- Bug in :meth:`DataFrame.loc.__getitem__` with :class:`MultiIndex` casting to float when at least one index column has float dtype and we retrieve a scalar (:issue:`41369`)
- Bug in :meth:`DataFrame.loc` incorrectly matching non-Boolean index elements (:issue:`20432`)
- Bug in indexing with ``np.nan`` on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` incorrectly raising ``KeyError`` when ``np.nan`` keys are present (:issue:`41933`)
- Bug in :meth:`Series.__delitem__` with ``ExtensionDtype`` incorrectly casting to ``ndarray`` (:issue:`40386`)
- Bug in :meth:`DataFrame.loc` returning a :class:`MultiIndex` in the wrong order if an indexer has duplicates (:issue:`40978`)
- Bug in :meth:`DataFrame.__setitem__` raising a ``TypeError`` when using a ``str`` subclass as the column name with a :class:`DatetimeIndex` (:issue:`37366`)
Expand Down
9 changes: 8 additions & 1 deletion pandas/core/indexes/category.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,14 @@ def _reindex_non_unique( # type: ignore[override]
# Indexing Methods

def _maybe_cast_indexer(self, key) -> int:
return self._data._unbox_scalar(key)
# GH#41933: we have to do this instead of self._data._validate_scalar
# because this will correctly get partial-indexing on Interval categories
try:
return self._data._unbox_scalar(key)
except KeyError:
if is_valid_na_for_dtype(key, self.categories.dtype):
return -1
raise

def _get_indexer(
self,
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/indexes/categorical/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,13 @@ def test_get_loc_nonmonotonic_nonunique(self):
expected = np.array([False, True, False, True], dtype=bool)
tm.assert_numpy_array_equal(result, expected)

def test_get_loc_nan(self):
# GH#41933
ci = CategoricalIndex(["A", "B", np.nan])
res = ci.get_loc(np.nan)

assert res == 2


class TestGetIndexer:
def test_get_indexer_base(self):
Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/indexing/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -540,3 +540,16 @@ def test_loc_getitem_with_non_string_categories(self, idx_values, ordered):
result.loc[sl, "A"] = ["qux", "qux2"]
expected = DataFrame({"A": ["qux", "qux2", "baz"]}, index=cat_idx)
tm.assert_frame_equal(result, expected)

def test_getitem_categorical_with_nan(self):
# GH#41933
ci = CategoricalIndex(["A", "B", np.nan])

ser = Series(range(3), index=ci)

assert ser[np.nan] == 2
assert ser.loc[np.nan] == 2

df = DataFrame(ser)
assert df.loc[np.nan, 0] == 2
assert df.loc[np.nan][0] == 2