Skip to content

Commit fcfd19f

Browse files
BUG: CategoricalIndex.get_indexer issue with NaNs (#45361) (#45373)
* BUG: CategoricalIndex.get_indexer issue with NaNs (#45361) categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci * BUG: CategoricalIndex.get_indexer issue with NaNs (#45361) categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci Update: np.isnan(target) was breaking the existing codebase. As a solution, I have enclosed this line in a try-except block * BUG: CategoricalIndex.get_indexer issue with NaNs (#45361) Update: Replaced `np.isnan` with `pandas.core.dtypes.missing.isna` * Added a testcase to verify output behaviour * Made pre-commit changes * Added a test case without NaNs * Moved NaN test to avoid unnecessary execution * Re-aligned test cases * Removed try-except block * Cleaned up base.py * Add GH#45361 comment to code * Added whatsnew entry * Resolved merge conflict * Moved whatsnew entry to indexing section
1 parent 4e30caa commit fcfd19f

File tree

3 files changed

+21
-1
lines changed

3 files changed

+21
-1
lines changed

doc/source/whatsnew/v1.5.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,7 @@ Indexing
269269
- Bug in :meth:`DataFrame.mask` with ``inplace=True`` and ``ExtensionDtype`` columns incorrectly raising (:issue:`45577`)
270270
- Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`)
271271
- Bug in indexing on a :class:`DatetimeIndex` with a ``np.str_`` key incorrectly raising (:issue:`45580`)
272+
- Bug in :meth:`CategoricalIndex.get_indexer` when index contains ``NaN`` values, resulting in elements that are in target but not present in the index to be mapped to the index of the NaN element, instead of -1 (:issue:`45361`)
272273
-
273274

274275
Missing

pandas/core/indexes/base.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -3791,6 +3791,7 @@ def get_indexer(
37913791
tolerance=None,
37923792
) -> npt.NDArray[np.intp]:
37933793
method = missing.clean_reindex_fill_method(method)
3794+
orig_target = target
37943795
target = self._maybe_cast_listlike_indexer(target)
37953796

37963797
self._check_indexing_method(method, limit, tolerance)
@@ -3813,9 +3814,15 @@ def get_indexer(
38133814

38143815
indexer = self._engine.get_indexer(target.codes)
38153816
if self.hasnans and target.hasnans:
3817+
# After _maybe_cast_listlike_indexer, target elements which do not
3818+
# belong to some category are changed to NaNs
3819+
# Mask to track actual NaN values compared to inserted NaN values
3820+
# GH#45361
3821+
target_nans = isna(orig_target)
38163822
loc = self.get_loc(np.nan)
38173823
mask = target.isna()
3818-
indexer[mask] = loc
3824+
indexer[target_nans] = loc
3825+
indexer[mask & ~target_nans] = -1
38193826
return indexer
38203827

38213828
if is_categorical_dtype(target.dtype):

pandas/tests/indexes/categorical/test_indexing.py

+12
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,18 @@ def test_get_indexer_same_categories_different_order(self):
298298
expected = np.array([1, 1], dtype="intp")
299299
tm.assert_numpy_array_equal(result, expected)
300300

301+
def test_get_indexer_nans_in_index_and_target(self):
302+
# GH 45361
303+
ci = CategoricalIndex([1, 2, np.nan, 3])
304+
other1 = [2, 3, 4, np.nan]
305+
res1 = ci.get_indexer(other1)
306+
expected1 = np.array([1, 3, -1, 2], dtype=np.intp)
307+
tm.assert_numpy_array_equal(res1, expected1)
308+
other2 = [1, 4, 2, 3]
309+
res2 = ci.get_indexer(other2)
310+
expected2 = np.array([0, -1, 1, 3], dtype=np.intp)
311+
tm.assert_numpy_array_equal(res2, expected2)
312+
301313

302314
class TestWhere:
303315
def test_where(self, listlike_box):

0 commit comments

Comments
 (0)