Skip to content

BUG: fix MultiIndex.remove_unused_levels() when index contains NaNs #18426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 22, 2017

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Nov 22, 2017

labels=[[0, 2, -1, 1, 1], [0, 1, 2, 3, 2]])

result = df.remove_unused_levels()
tm.assert_index_equal(result, df)
Copy link
Contributor

@Licht-T Licht-T Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to drop 'unused' when level = ['a', 'd', 'b', 'unused'].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does... do you mean "you should check it does"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz I am asking why original df and dropped result are same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In level = ['a', 'd', 'b', 'unused'] case, 'unused' is unused and should be dropped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz I am asking why original df and dropped result are same.

They are equivalent precisely because the dropped level is not used. "The resulting MultiIndex will have the same outward appearance, meaning the same .values and ordering. It will also be .equals() to the original."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'unused' is unused and should be dropped.

This is exactly what happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz Okay, you are right.
But you have to check whether unused is really dropped.

@@ -1366,6 +1366,11 @@ def remove_unused_levels(self):

changed = False
for lev, lab in zip(self.levels, self.labels):
null_mask = lab == -1
if null_mask.any():
lab = lab[~null_mask]
Copy link
Contributor

@Licht-T Licht-T Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work well.

In [2]: import pandas as pd
In [3]: df = pd.MultiIndex(levels=[['a', 'd', 'b'], ['w', 'x', 'y', 'z']], labels=[[0, 2, -1, 1, 1], [0, 0, 0, 0, 0]])
In [4]: df
Out[4]:
MultiIndex(levels=[[u'a', u'd', u'b'], [u'w', u'x', u'y', u'z']],
           labels=[[0, 2, -1, 1, 1], [0, 0, 0, 0, 0]])
In [5]: df.remove_unused_levels()
Out[5]:
MultiIndex(levels=[[u'a', u'd', u'b'], [u'w']],
           labels=[[0, 2, 1, 1], [0, 0, 0, 0, 0]])

Copy link
Contributor

@Licht-T Licht-T Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Nov 22, 2017

Codecov Report

Merging #18426 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18426      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         163      163              
  Lines       49714    49722       +8     
==========================================
- Hits        45415    45414       -1     
- Misses       4299     4308       +9
Flag Coverage Δ
#multiple 89.13% <100%> (ø) ⬆️
#single 39.63% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.42% <100%> (+0.02%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d421a09...6d1fcc1. Read the comment docs.

@toobaz
Copy link
Member Author

toobaz commented Nov 22, 2017

Please refer this.

Not following you, please explain (if this still applies).

@codecov
Copy link

codecov bot commented Nov 22, 2017

Codecov Report

Merging #18426 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18426      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         163      163              
  Lines       49714    49717       +3     
==========================================
- Hits        45415    45409       -6     
- Misses       4299     4308       +9
Flag Coverage Δ
#multiple 89.13% <100%> (ø) ⬆️
#single 39.63% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.41% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d421a09...1ce45d0. Read the comment docs.

@jreback jreback added this to the 0.22.0 milestone Nov 22, 2017
@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Nov 22, 2017
@jreback jreback merged commit 717c4a2 into pandas-dev:master Nov 22, 2017
@jreback
Copy link
Contributor

jreback commented Nov 22, 2017

thanks @toobaz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiIndex.remove_unused_levels() fills nans
3 participants