Skip to content

BUG: .get_indexer_non_unique() must return an array of ints (#44084) #44404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

johannes-mueller
Copy link
Contributor

@johannes-mueller johannes-mueller commented Nov 12, 2021

GH#44084 boils down to the following.

According to the docs .get_indexer_non_unique() is supposed to return
"integers from 0 to n - 1 indicating that the index at these positions matches
the corresponding target values". However, for an index that is non unique and
non monotonic it returns a boolean mask. That is because it uses .get_loc()
which for non unique, non monotonic indexes returns a boolean mask.

This patch catches that case and converts the boolean mask from .get_loc()
into the corresponding array of integers if the index is not unique and not
monotonic.

…ev#44084)

GH#44084 boils down to the following.

According to the docs `.get_indexer_non_unique()` is supposed to return
"integers from 0 to n - 1 indicating that the index at these positions matches
the corresponding target values".  However, for an index that is non unique and
non monotonic it returns a boolean mask.  That is because it uses `.get_loc()`
which for non unique, non monotonic indexes returns a boolean mask.

This patch catches that case and converts the boolean mask from `.get_loc()`
into the corresponding array of integers if the index is not unique and not
monotonic.
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add whatsnew

@@ -373,6 +373,16 @@ def test_get_indexer_with_nans(self):
expected = np.array([0, 1], dtype=np.intp)
tm.assert_numpy_array_equal(result, expected)

def test_get_index_non_unique_non_monotonic(self):
# GH#44084
index = IntervalIndex.from_tuples(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the MultiIndex cases

On Windows `np.array([1, 3])` is obviously int32 and thus the comparison to the
int64 array fails due to dtype mismatch.
Sometimes the world out there is a bit more complicated than what you have on
your cozy desktop :)
@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type MultiIndex labels Nov 12, 2021
@jreback jreback added this to the 1.4 milestone Nov 14, 2021
@jreback jreback merged commit 2bbd4d6 into pandas-dev:master Nov 14, 2021
@jreback
Copy link
Contributor

jreback commented Nov 14, 2021

thanks @johannes-mueller

do we have sufficient testing on this for other index types aside from those explicityly tested here? if not would take a PR for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: get_indexer_for() does not always return list of indeces
3 participants