-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: .get_indexer_non_unique() must return an array of ints (#44084) #44404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: .get_indexer_non_unique() must return an array of ints (#44084) #44404
Conversation
…ev#44084) GH#44084 boils down to the following. According to the docs `.get_indexer_non_unique()` is supposed to return "integers from 0 to n - 1 indicating that the index at these positions matches the corresponding target values". However, for an index that is non unique and non monotonic it returns a boolean mask. That is because it uses `.get_loc()` which for non unique, non monotonic indexes returns a boolean mask. This patch catches that case and converts the boolean mask from `.get_loc()` into the corresponding array of integers if the index is not unique and not monotonic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add whatsnew
@@ -373,6 +373,16 @@ def test_get_indexer_with_nans(self): | |||
expected = np.array([0, 1], dtype=np.intp) | |||
tm.assert_numpy_array_equal(result, expected) | |||
|
|||
def test_get_index_non_unique_non_monotonic(self): | |||
# GH#44084 | |||
index = IntervalIndex.from_tuples( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the MultiIndex cases
On Windows `np.array([1, 3])` is obviously int32 and thus the comparison to the int64 array fails due to dtype mismatch.
Sometimes the world out there is a bit more complicated than what you have on your cozy desktop :)
thanks @johannes-mueller do we have sufficient testing on this for other index types aside from those explicityly tested here? if not would take a PR for that |
GH#44084 boils down to the following.
According to the docs
.get_indexer_non_unique()
is supposed to return"integers from 0 to n - 1 indicating that the index at these positions matches
the corresponding target values". However, for an index that is non unique and
non monotonic it returns a boolean mask. That is because it uses
.get_loc()
which for non unique, non monotonic indexes returns a boolean mask.
This patch catches that case and converts the boolean mask from
.get_loc()
into the corresponding array of integers if the index is not unique and not
monotonic.