Skip to content

BUG: fix get_indexer_non_unique() with 'object' targets with NaNs (#4… #44483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -592,9 +592,9 @@ Strings

Interval
^^^^^^^^
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` returning boolean mask instead of array of integers for a non unique and non monotonic index (:issue:`44084`)
- Bug in :meth:`Series.where` with ``IntervalDtype`` incorrectly raising when the ``where`` call should not replace anything (:issue:`44181`)
-
-

Indexing
^^^^^^^^
Expand Down Expand Up @@ -625,6 +625,8 @@ Indexing
- Bug in :meth:`DataFrame.loc.__setitem__` and :meth:`DataFrame.iloc.__setitem__` with mixed dtypes sometimes failing to operate in-place (:issue:`44345`)
- Bug in :meth:`DataFrame.loc.__getitem__` incorrectly raising ``KeyError`` when selecting a single column with a boolean key (:issue:`44322`).
- Bug in indexing on columns with ``loc`` or ``iloc`` using a slice with a negative step with ``ExtensionDtype`` columns incorrectly raising (:issue:`44551`)
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` returning boolean mask instead of array of integers for a non unique and non monotonic index (:issue:`44084`)
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` not handling targets of ``dtype`` 'object' with NaNs correctly (:issue:`44482`)
-

Missing
Expand Down
7 changes: 6 additions & 1 deletion pandas/_libs/index.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,12 @@ cdef class IndexEngine:
missing = np.empty(n_t, dtype=np.intp)

# map each starget to its position in the index
if stargets and len(stargets) < 5 and self.is_monotonic_increasing:
if (
stargets and
len(stargets) < 5 and
not any([checknull(t) for t in stargets]) and
self.is_monotonic_increasing
):
# if there are few enough stargets and the index is monotonically
# increasing, then use binary search for each starget
remaining_stargets = set()
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/indexes/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,3 +332,12 @@ def test_get_indexer_non_unique_multiple_nans(idx, target, expected):
axis = Index(idx)
actual = axis.get_indexer_for(target)
tm.assert_numpy_array_equal(actual, expected)


def test_get_indexer_non_unique_nans_in_object_dtype_target(nulls_fixture):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also test NP_NAT_OBJECTS here

idx = Index([1.0, 2.0])
target = Index([1, nulls_fixture], dtype="object")

result_idx, result_missing = idx.get_indexer_non_unique(target)
tm.assert_numpy_array_equal(result_idx, np.array([0, -1], dtype=np.intp))
tm.assert_numpy_array_equal(result_missing, np.array([1], dtype=np.intp))