Skip to content

Commit 7ae8ebb

Browse files
committed
BUG: CategoricalIndex.get_indexer issue with NaNs (pandas-dev#45361)
categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci
1 parent e53bda9 commit 7ae8ebb

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

pandas/core/indexes/base.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -3696,6 +3696,10 @@ def get_indexer(
36963696
tolerance=None,
36973697
) -> npt.NDArray[np.intp]:
36983698
method = missing.clean_reindex_fill_method(method)
3699+
# After _maybe_cast_listlike_indexer, target elements which do not
3700+
# belong to some category are changed to NaNs
3701+
# Mask to track actual NaN values compared to inserted NaN values
3702+
target_nans = np.isnan(target)
36993703
target = self._maybe_cast_listlike_indexer(target)
37003704

37013705
self._check_indexing_method(method, limit, tolerance)
@@ -3720,7 +3724,8 @@ def get_indexer(
37203724
if self.hasnans and target.hasnans:
37213725
loc = self.get_loc(np.nan)
37223726
mask = target.isna()
3723-
indexer[mask] = loc
3727+
indexer[target_nans] = loc
3728+
indexer[mask & ~target_nans] = -1
37243729
return indexer
37253730

37263731
if is_categorical_dtype(target.dtype):

0 commit comments

Comments
 (0)