Skip to content

Commit 8b3a671

Browse files
committed
BUG: CategoricalIndex.get_indexer issue with NaNs (pandas-dev#45361)
categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci Update: np.isnan(target) was breaking the existing codebase. As a solution, I have enclosed this line in a try-except block
1 parent 7ae8ebb commit 8b3a671

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

pandas/core/indexes/base.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -3699,7 +3699,10 @@ def get_indexer(
36993699
# After _maybe_cast_listlike_indexer, target elements which do not
37003700
# belong to some category are changed to NaNs
37013701
# Mask to track actual NaN values compared to inserted NaN values
3702-
target_nans = np.isnan(target)
3702+
try:
3703+
target_nans = np.isnan(target)
3704+
except TypeError as e:
3705+
target_nans = False
37033706
target = self._maybe_cast_listlike_indexer(target)
37043707

37053708
self._check_indexing_method(method, limit, tolerance)

0 commit comments

Comments
 (0)