-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix MultiIndex.remove_unused_levels() when index contains NaNs #18426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/indexes/test_multi.py
Outdated
labels=[[0, 2, -1, 1, 1], [0, 1, 2, 3, 2]]) | ||
|
||
result = df.remove_unused_levels() | ||
tm.assert_index_equal(result, df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to drop 'unused'
when level = ['a', 'd', 'b', 'unused']
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does... do you mean "you should check it does"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz I am asking why original df
and dropped result
are same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In level = ['a', 'd', 'b', 'unused']
case, 'unused'
is unused and should be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz I am asking why original df and dropped result are same.
They are equivalent precisely because the dropped level is not used. "The resulting MultiIndex will have the same outward appearance, meaning the same .values and ordering. It will also be .equals() to the original."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'unused' is unused and should be dropped.
This is exactly what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz Okay, you are right.
But you have to check whether unused
is really dropped.
pandas/core/indexes/multi.py
Outdated
@@ -1366,6 +1366,11 @@ def remove_unused_levels(self): | |||
|
|||
changed = False | |||
for lev, lab in zip(self.levels, self.labels): | |||
null_mask = lab == -1 | |||
if null_mask.any(): | |||
lab = lab[~null_mask] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work well.
In [2]: import pandas as pd
In [3]: df = pd.MultiIndex(levels=[['a', 'd', 'b'], ['w', 'x', 'y', 'z']], labels=[[0, 2, -1, 1, 1], [0, 0, 0, 0, 0]])
In [4]: df
Out[4]:
MultiIndex(levels=[[u'a', u'd', u'b'], [u'w', u'x', u'y', u'z']],
labels=[[0, 2, -1, 1, 1], [0, 0, 0, 0, 0]])
In [5]: df.remove_unused_levels()
Out[5]:
MultiIndex(levels=[[u'a', u'd', u'b'], [u'w']],
labels=[[0, 2, 1, 1], [0, 0, 0, 0, 0]])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
81544d0
to
6d1fcc1
Compare
Codecov Report
@@ Coverage Diff @@
## master #18426 +/- ##
==========================================
- Coverage 91.35% 91.33% -0.02%
==========================================
Files 163 163
Lines 49714 49722 +8
==========================================
- Hits 45415 45414 -1
- Misses 4299 4308 +9
Continue to review full report at Codecov.
|
Not following you, please explain (if this still applies). |
Codecov Report
@@ Coverage Diff @@
## master #18426 +/- ##
==========================================
- Coverage 91.35% 91.33% -0.02%
==========================================
Files 163 163
Lines 49714 49717 +3
==========================================
- Hits 45415 45409 -6
- Misses 4299 4308 +9
Continue to review full report at Codecov.
|
6d1fcc1
to
1ce45d0
Compare
thanks @toobaz |
git diff upstream/master -u -- "*.py" | flake8 --diff