-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: pd.MultiIndex.get_loc(np.nan) #28919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5da4a8d
to
9cf08c1
Compare
assert idx.get_loc((np.nan, 1)) == expected | ||
|
||
|
||
def test_get_indexer_with_missing_value(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing get_loc and get_indexer are good, but we dont expect users to see those directly. is there any user-facing behavior that this changes that should be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you respond to this, is there any user facing change aside from .get_loc itself, IOW does user facing indexing change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
.get_slice_bound
, __contains__
,slice_indexer
, slice_locs
affect from this change.
I add test case too.
14bba33
to
a17ef45
Compare
assert idx.get_loc((np.nan, 1)) == expected | ||
|
||
|
||
def test_get_indexer_with_missing_value(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you respond to this, is there any user facing change aside from .get_loc itself, IOW does user facing indexing change?
@jbrockmendel @jreback If,
case1 : "get_slice_bound"
case2 : "slice_indexer"
case3: "slice_locs"
case4:
problem is when this PR is merged, if |
can you show what you mean here? |
not raise exception. But
So If this PR is merged, change |
can you merge master and will look again |
ok this PR looks good. can you add in support for .isin() in this PR? I think it would make sense to merge these simultaneously (e.g. you can also build on this one if that is easier) |
@jreback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. some comments, ping on green.
@toobaz @jbrockmendel @TomAugspurger if you'd care to have a look
else: | ||
return np.lib.arraysetops.in1d(level_codes, sought_labels) | ||
return np.zeros(len(levs), dtype=np.bool_) | ||
return levs.isin(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the edits from L3408 down to here look like they are just nice cleanups independent of the rest of this PR. is that accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel
For NA values, someone fixes #30677, then more accurate. "Index.isin" has a bug nonetheless in terms of checking NA value can be possible, This is more accurate
[ | ||
([("b", np.nan)], np.array([False, False, True]), None,), | ||
([np.nan, "a"], np.array([True, True, False]), 0), | ||
(["d", np.nan], np.array([False, True, True]), 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the issue specific to np.nan, or are there other NA values worth testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel
xref #30677. Yes. it is specific to np.nan. In "Index", np.nan , np.NaT, None are discernible not denoted by NA value. So if MultiIndex mixs with np.nan, np.NaT, None all together, result of ".isin" are different from what we know.
thanks @proost very nice! |
MultiIndex.get_loc could not find nan with values including missing
values as a input.
Background: In
MultiIndex
, missing value is denoted by -1 in codes and doesn't exist inself.levels
So, could not find NA value in
self.levels
.Before PR xref #28783
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff