-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: MultiIndex.get_indexer not matching nan values #49442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Nice. Does this close all the issues linked in #48877? Are any of those tests worth adding? |
It should, and yes. I'd like to merge the tests of your pr after this is through if you are ok with that |
Co-authored-by: Matthew Roeschke <[email protected]>
pandas/_libs/index.pyx
Outdated
# with positive integers (-1 for NaN becomes 1). This enables us to | ||
# differentiate between values that are missing in other and matching | ||
# NaNs. We will set values that are not found to 0 later: | ||
codes = (np.array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be split onto multiple lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -36,6 +36,8 @@ from pandas._libs.missing cimport ( | |||
is_matching_na, | |||
) | |||
|
|||
multiindex_nulls_shift = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment about what this is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cc @jbrockmendel merge when ready
Thanks @phofl |
* BUG: MultiIndex.get_indexer not matching nan values * Fix tests * Use variable for shifts * Fix mypy * Add gh ref * Update pandas/_libs/index.pyx Co-authored-by: Matthew Roeschke <[email protected]> * Adress review Co-authored-by: Matthew Roeschke <[email protected]>
* BUG: MultiIndex.get_indexer not matching nan values * Fix tests * Use variable for shifts * Fix mypy * Add gh ref * Update pandas/_libs/index.pyx Co-authored-by: Matthew Roeschke <[email protected]> * Adress review Co-authored-by: Matthew Roeschke <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This was bugging me for quite some time.
The previous solution did not provide a way to differentiate between matching missing values (e.g. NaN and NaN) and values from other that were missing in self. With shifting NaNs to 1 and values from other that are not in self to 0, we can calculate the correct result.