Skip to content

BUG: pd.MultiIndex.get_loc(np.nan) #28783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

proost
Copy link
Contributor

@proost proost commented Oct 4, 2019

MultiIndex.get_loc can find nan when input is NA value(e.g. nan, None)

@pep8speaks
Copy link

pep8speaks commented Oct 4, 2019

Hello @proost! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-10 03:33:36 UTC

@proost proost changed the title BUG: pd.MultiIndex.get_loc(np.nan) (#19132) BUG: pd.MultiIndex.get_loc(np.nan) Oct 4, 2019
@WillAyd WillAyd added Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex labels Oct 4, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 4, 2019

@toobaz care to take a look?

@@ -229,6 +229,7 @@ Indexing
- Bug in reindexing a :meth:`PeriodIndex` with another type of index that contained a `Period` (:issue:`28323`) (:issue:`28337`)
- Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`)
- Bug in :meth:`Float64Index.astype` where ``np.inf`` was not handled properly when casting to an integer dtype (:issue:`28475`)
- Bug in :meth:`MultiIndex.get_loc` can't find ``np.nan`` when input is NA value (:issue:`19132`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in ..... with a null value as input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a user visible effect of this? e.g. in actually indexing a Series/DataFrame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Well.... .loc can find nan with a null value as input. Also .get_loc can find values which include nan. But only when input is a null value, can't find indexer. So i think working .get_loc with a null value correctly also makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but .get_loc is not a very visible user visible function, what I mean is docs .loc[] on something in particular now work when it did not before?

Copy link
Contributor Author

@proost proost Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I change whatsnew. I try to pass on what i intend more clearly. Okay I understand. I agree that get_loc is not a very visible user visible function. But eventually, get_loc is used for index object not for Series/DataFrame. What i concern is explanation with indexing a Sereis/DataFrame might tarnish its intention.

Copy link
Member

@toobaz toobaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have time to test with level > 0, sorry if my comments are a bit vague on this point. I think the comment on the nested method holds, though.

# The label is present in self.levels[level] but unused:
raise KeyError(key)
return slice(i, j)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need this nested method? Unless I'm mistaken, you can just put the code = level_index.get_loc(key) in a conditional statement.

Copy link
Contributor Author

@proost proost Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz there is code = level_index.get_loc(key) after else statement. This code existed before. I just nest code and put it upper place. Because nesting it shows more clearly what code is for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is code = level_index.get_loc(key) after else statement. This code existed before.

Yes, that's what I was referring to: this PR could have just replaced that with

if not is_list_like(key) and isna(key):
    code = -1
else:
    code = level_index.get_loc(key)

right?

Because nesting it shows more clearly what code is for.

In my (limited) experience, nested functions make code less readable, and harder to debug. If we want to isolate this code, I would prefer with a separate (better documented) function. But maybe it's a matter of taste, @jreback what's your take on this?

@@ -229,6 +229,7 @@ Indexing
- Bug in reindexing a :meth:`PeriodIndex` with another type of index that contained a `Period` (:issue:`28323`) (:issue:`28337`)
- Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`)
- Bug in :meth:`Float64Index.astype` where ``np.inf`` was not handled properly when casting to an integer dtype (:issue:`28475`)
- Bug in :meth:`MultiIndex.get_loc` can't find ``np.nan`` with a null value as input (:issue:`19132`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention "partial indexing" and "first level" somewhere - assuming this is really specific to first level.

Copy link
Contributor Author

@proost proost Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz When 'get_loc' argument is just one, 'get_loc' in 'MultiIndex' searches at first level before i send PR. I think there was an implicit agreement, so i didn't put in whatsnew.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected a fix to #19132 to also solve the case mi.get_loc((1.0, np.nan)). This PR doesn't, fair enough, but I would make it slightly more explicit. Just "can't find np.nan in first level" is probably enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz. I missunderstood issue. I thought just want to fix with a one nan. Okay. It's my fault. I want to fix completely. I try to figure out and fix all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as you prefer. I'm not saying the aim of this PR was not good. Then sure, if you are able to provide a more general solution, that's great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz Thank you for your review and pointing out sharply

@proost proost force-pushed the get_loc-nan branch 2 times, most recently from 0ff4f88 to 1db7dd4 Compare October 9, 2019 14:43
MultiIndex.get_loc can't find nan with a null
value as input
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.MultiIndex.get_loc(np.nan)
5 participants