Skip to content
This repository was archived by the owner on Jul 10, 2024. It is now read-only.

Commit 7e32f0c

Browse files
BUG: fix get_indexer_non_unique() with 'object' targets with NaNs (pandas-dev#44482)
numpy.searchsorted() does not handle NaNs in 'object' arrays as expected (numpy/numpy#15499). Therefore we cannot search NaNs using binary search. So we use binary search only for targets without NaNs.
1 parent 5ee8cf1 commit 7e32f0c

File tree

3 files changed

+18
-2
lines changed

3 files changed

+18
-2
lines changed

doc/source/whatsnew/v1.4.0.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -592,9 +592,9 @@ Strings
592592

593593
Interval
594594
^^^^^^^^
595-
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` returning boolean mask instead of array of integers for a non unique and non monotonic index (:issue:`44084`)
596595
- Bug in :meth:`Series.where` with ``IntervalDtype`` incorrectly raising when the ``where`` call should not replace anything (:issue:`44181`)
597596
-
597+
-
598598

599599
Indexing
600600
^^^^^^^^
@@ -625,6 +625,8 @@ Indexing
625625
- Bug in :meth:`DataFrame.loc.__setitem__` and :meth:`DataFrame.iloc.__setitem__` with mixed dtypes sometimes failing to operate in-place (:issue:`44345`)
626626
- Bug in :meth:`DataFrame.loc.__getitem__` incorrectly raising ``KeyError`` when selecting a single column with a boolean key (:issue:`44322`).
627627
- Bug in indexing on columns with ``loc`` or ``iloc`` using a slice with a negative step with ``ExtensionDtype`` columns incorrectly raising (:issue:`44551`)
628+
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` returning boolean mask instead of array of integers for a non unique and non monotonic index (:issue:`44084`)
629+
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` not handling targets of ``dtype`` 'object' with NaNs correctly (:issue:`44482`)
628630
-
629631

630632
Missing

pandas/_libs/index.pyx

+6-1
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,12 @@ cdef class IndexEngine:
338338
missing = np.empty(n_t, dtype=np.intp)
339339

340340
# map each starget to its position in the index
341-
if stargets and len(stargets) < 5 and self.is_monotonic_increasing:
341+
if (
342+
stargets and
343+
len(stargets) < 5 and
344+
not any([checknull(t) for t in stargets]) and
345+
self.is_monotonic_increasing
346+
):
342347
# if there are few enough stargets and the index is monotonically
343348
# increasing, then use binary search for each starget
344349
remaining_stargets = set()

pandas/tests/indexes/test_indexing.py

+9
Original file line numberDiff line numberDiff line change
@@ -332,3 +332,12 @@ def test_get_indexer_non_unique_multiple_nans(idx, target, expected):
332332
axis = Index(idx)
333333
actual = axis.get_indexer_for(target)
334334
tm.assert_numpy_array_equal(actual, expected)
335+
336+
337+
def test_get_indexer_non_unique_nans_in_object_dtype_target(nulls_fixture):
338+
idx = Index([1.0, 2.0])
339+
target = Index([1, nulls_fixture], dtype="object")
340+
341+
result_idx, result_missing = idx.get_indexer_non_unique(target)
342+
tm.assert_numpy_array_equal(result_idx, np.array([0, -1], dtype=np.intp))
343+
tm.assert_numpy_array_equal(result_missing, np.array([1], dtype=np.intp))

0 commit comments

Comments
 (0)