Skip to content

BUG: fix get_indexer_non_unique() with 'object' targets with NaNs (#4… #44483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

johannes-mueller
Copy link
Contributor

@johannes-mueller johannes-mueller commented Nov 16, 2021

…4482)

numpy.searchsorted() does not handle NaNs in 'object' arrays as
expected (numpy/numpy#15499). Therefore we cannot search NaNs using binary
search. So we use binary search only for targets without NaNs.

This does also put the whatsnew of #44404 to the correct place.

@johannes-mueller johannes-mueller force-pushed the bugfix/get-indexer-non-unique-object-nans-44482 branch from 75ff6b0 to f4d3b1a Compare November 16, 2021 13:31
@johannes-mueller johannes-mueller force-pushed the bugfix/get-indexer-non-unique-object-nans-44482 branch from f4d3b1a to 1c73fba Compare November 17, 2021 08:47
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…ndas-dev#44482)

numpy.searchsorted() does not handle NaNs in 'object' arrays as
expected (numpy/numpy#15499).  Therefore we cannot search NaNs using binary
search.  So we use binary search only for targets without NaNs.
@johannes-mueller johannes-mueller force-pushed the bugfix/get-indexer-non-unique-object-nans-44482 branch from 1c73fba to 7e32f0c Compare November 26, 2021 07:42
@johannes-mueller
Copy link
Contributor Author

Rebased & resolved conflict

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Nov 26, 2021
@jreback jreback added this to the 1.4 milestone Nov 26, 2021
@jreback jreback merged commit a9a37d6 into pandas-dev:master Nov 26, 2021
@jreback
Copy link
Contributor

jreback commented Nov 26, 2021

thanks @johannes-mueller very nice, keep em coming!

@@ -332,3 +332,12 @@ def test_get_indexer_non_unique_multiple_nans(idx, target, expected):
axis = Index(idx)
actual = axis.get_indexer_for(target)
tm.assert_numpy_array_equal(actual, expected)


def test_get_indexer_non_unique_nans_in_object_dtype_target(nulls_fixture):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also test NP_NAT_OBJECTS here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: get_indexer_non_unique() does not handle targets of dtype='object' with NaNs correctly
3 participants