-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
MultiIndex.get_level_values() replaces NA by another value #5074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
not a bad idea (this actually makes take deal with this correctly). you can add a method need some more tests, edge cases, e.g. empty index, multiple nan (datetime w/NaT), -1 at the end, beginning (you have in the middle case) |
Better fix is to create a mask based on value == -1 and then fill with nan |
@goyodiaz do you have Travis set up? Interested if your solution passes. I didn't realize you were saying to just do it temporarily. I still think it'll create issues if you wanted to shorten levels later on, but I'm less convinced than in my previous comment :) @jreback does append create a copy of the underlying memory? I guess it's happening in either case, but bool certainly smaller slight tweak: mask = labels == -1
values = unique_values.take(labels)
if mask.any():
values = values.astype(float)
values[mask] = np.nan |
yes append copies |
@jtratner yes, travis builds passed. Can you think of any possible side effect which should be tested? I did not understand well your concerns. |
go ahead an open a pull-request, this will submit as a patch to the devs |
Will do it in a while. BTW there is nothing to fix with DatetimeIndex:
But I guess this could change when numpy get proper integer nan support, if ever. |
@goyodiaz - it's totally fine, I appreciate you figuring out what was going That said, figuring out that the issue was that it was taking the wrong |
Behavior with NaT actually could be a bug, not sure. |
@jratner I think you are right, it's a bug in |
I guess I should link the PR: #5090 |
BUG: MultiIndex.get_level_values() replaces NA by another value (#5074)
Test case:
The expected output is
Float64Index([1.0, nan, 2.0], dtype=object)
This happens because NA values are not stored in the MultiIndex levels and the corresponding label is set to -1. Then when labels are used as indexes to values in
get_level_values()
that -1 points to the last (not null) value.I tried to fix this by appending a NA to the values if -1 is in levels.
https://github.com/goyodiaz/pandas/commit/f028513ad96a
It needs to be improved in order to return the proper NA value (NaN, None, maybe NaT?) depending on the index type. Does this approach makes sense?
The text was updated successfully, but these errors were encountered: