-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pd.MultiIndex.get_loc(np.nan) #28783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@toobaz care to take a look? |
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -229,6 +229,7 @@ Indexing | |||
- Bug in reindexing a :meth:`PeriodIndex` with another type of index that contained a `Period` (:issue:`28323`) (:issue:`28337`) | |||
- Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`) | |||
- Bug in :meth:`Float64Index.astype` where ``np.inf`` was not handled properly when casting to an integer dtype (:issue:`28475`) | |||
- Bug in :meth:`MultiIndex.get_loc` can't find ``np.nan`` when input is NA value (:issue:`19132`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in ..... with a null value as input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a user visible effect of this? e.g. in actually indexing a Series/DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Well.... .loc can find nan with a null value as input. Also .get_loc can find values which include nan. But only when input is a null value, can't find indexer. So i think working .get_loc with a null value correctly also makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but .get_loc
is not a very visible user visible function, what I mean is docs .loc[]
on something in particular now work when it did not before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I change whatsnew. I try to pass on what i intend more clearly. Okay I understand. I agree that get_loc
is not a very visible user visible function. But eventually, get_loc
is used for index object not for Series/DataFrame. What i concern is explanation with indexing a Sereis/DataFrame might tarnish its intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have time to test with level > 0, sorry if my comments are a bit vague on this point. I think the comment on the nested method holds, though.
# The label is present in self.levels[level] but unused: | ||
raise KeyError(key) | ||
return slice(i, j) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need this nested method? Unless I'm mistaken, you can just put the code = level_index.get_loc(key)
in a conditional statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz there is code = level_index.get_loc(key)
after else statement. This code existed before. I just nest code and put it upper place. Because nesting it shows more clearly what code is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is
code = level_index.get_loc(key)
after else statement. This code existed before.
Yes, that's what I was referring to: this PR could have just replaced that with
if not is_list_like(key) and isna(key):
code = -1
else:
code = level_index.get_loc(key)
right?
Because nesting it shows more clearly what code is for.
In my (limited) experience, nested functions make code less readable, and harder to debug. If we want to isolate this code, I would prefer with a separate (better documented) function. But maybe it's a matter of taste, @jreback what's your take on this?
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -229,6 +229,7 @@ Indexing | |||
- Bug in reindexing a :meth:`PeriodIndex` with another type of index that contained a `Period` (:issue:`28323`) (:issue:`28337`) | |||
- Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`) | |||
- Bug in :meth:`Float64Index.astype` where ``np.inf`` was not handled properly when casting to an integer dtype (:issue:`28475`) | |||
- Bug in :meth:`MultiIndex.get_loc` can't find ``np.nan`` with a null value as input (:issue:`19132`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mention "partial indexing" and "first level" somewhere - assuming this is really specific to first level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz When 'get_loc' argument is just one, 'get_loc' in 'MultiIndex' searches at first level before i send PR. I think there was an implicit agreement, so i didn't put in whatsnew.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected a fix to #19132 to also solve the case mi.get_loc((1.0, np.nan))
. This PR doesn't, fair enough, but I would make it slightly more explicit. Just "can't find np.nan
in first level" is probably enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz. I missunderstood issue. I thought just want to fix with a one nan. Okay. It's my fault. I want to fix completely. I try to figure out and fix all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as you prefer. I'm not saying the aim of this PR was not good. Then sure, if you are able to provide a more general solution, that's great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz Thank you for your review and pointing out sharply
0ff4f88
to
1db7dd4
Compare
MultiIndex.get_loc can't find nan with a null value as input
MultiIndex.get_loc can find nan when input is NA value(e.g. nan, None)
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff