Skip to content

BUG: incorrect set_labels in MI #19058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 6, 2018
Merged

BUG: incorrect set_labels in MI #19058

merged 6 commits into from
Jan 6, 2018

Conversation

Mofef
Copy link
Contributor

@Mofef Mofef commented Jan 3, 2018

This fixes #19057

@codecov
Copy link

codecov bot commented Jan 3, 2018

Codecov Report

Merging #19058 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19058      +/-   ##
==========================================
+ Coverage   91.53%   91.53%   +<.01%     
==========================================
  Files         148      148              
  Lines       48688    48689       +1     
==========================================
+ Hits        44566    44567       +1     
  Misses       4122     4122
Flag Coverage Δ
#multiple 89.9% <100%> (ø) ⬆️
#single 41.62% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.22% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 821028f...48fb843. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks ok, but pls add your original test (both with and w/o inplace) as well as a note in the whatsnew

@jreback jreback changed the title Fix #19057 BUG: incorrect set_labels in MI Jan 4, 2018
@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions MultiIndex labels Jan 4, 2018
@jreback
Copy link
Contributor

jreback commented Jan 4, 2018

cc @toobaz

any thoughts here

@toobaz
Copy link
Member

toobaz commented Jan 4, 2018

Looks good to me. As for tests, I guess it's just a matter of understanding what went wrong in this one, and fix it.

@Mofef
Copy link
Contributor Author

Mofef commented Jan 4, 2018

you probably mean

assert_matching(self.index.labels, labels)
set_levels works correctly iirc.
The issue with the test is that self.index has the same magnitude of categories in each level. (i.e. the length is each smaller than int8_max). So the misalignment does not have an effect. Also there is no misalignment if the labels get replaced in the first n levels.
I would like to take care of the tests later.

@toobaz
Copy link
Member

toobaz commented Jan 4, 2018

you probably mean

yep, sorry, actually this assertion:

assert_matching(self.index.labels, labels)

But I agree with your explanation.

@jreback
Copy link
Contributor

jreback commented Jan 4, 2018

@Mofef tests need to be in the same issue as the PR fixing.

@Mofef
Copy link
Contributor Author

Mofef commented Jan 4, 2018

@jreback Sorry I don't understand, you mean i should also add tests to this PR?

In the meanwhile i looked into test_multi.py. My initial idea was to just increase the size of the minor level with something like

    def setup_method(self, method):
        major_axis = Index(['foo', 'bar', 'baz', 'qux'])
        minor_axis = Index(range(2, 128))

        major_labels = np.random.choice(range(4), 128)
        minor_labels = np.array(range(128))
...

However, that doesn't seem to be feasible since a lot of test methods rely on the current "reference index". So I get ~50 presumably false negative tests.
Another option would be to create a bigger MultiIndex down in test_set_labels. On the other hand it's probably a good idea to check each operation against such an index with levels of different dtype.
What do you think?

@jreback
Copy link
Contributor

jreback commented Jan 4, 2018

@Mofef yes you need a test, you can use the one that you put in the issue. you need to assert that this works (eg. construct the expected MI).

@Mofef
Copy link
Contributor Author

Mofef commented Jan 5, 2018

Done. I had to define another test method, with its own assert_matching function, since inside test_set_labels the expected labels get casted to int8. Imho this should happen outside of a function called "assert_matching"

I also stumbled upon #19092

Edit: I forgot about the whatsnew...


ind = pd.MultiIndex.from_tuples([(0, i) for i in range(130)])
new_labels = range(129, -1, -1)
ind_reference = pd.MultiIndex.from_tuples(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this expected


# [w/o mutation]
ind2 = ind.set_labels(labels=new_labels, level=1)
assert_equal(ind2, ind_reference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call these result, then you can simply do

assert result.equals(expected)

for the inplace case you need to copy before

@@ -327,6 +327,28 @@ def assert_matching(actual, expected):
assert_matching(ind2.labels, new_labels)
assert_matching(self.index.labels, labels)

def test_set_label_distinctly_sized_levels(self):
# label changing for levels of different magnitude of categories
def assert_equal(actual, expected):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below this is not necessary

@Mofef
Copy link
Contributor Author

Mofef commented Jan 5, 2018

Cool, thanks. PTAL

@jreback jreback added this to the 0.23.0 milestone Jan 5, 2018
@jreback
Copy link
Contributor

jreback commented Jan 5, 2018

looks good. ping on green.

@Mofef
Copy link
Contributor Author

Mofef commented Jan 5, 2018

@jreback https://travis-ci.org/pandas-dev/pandas/jobs/325457802#L448

648.46s$ git fetch origin +refs/pull/19058/merge:
fatal: unable to access 'https://github.com/pandas-dev/pandas.git/': Failed to connect to github.com port 443: Connection timed out

@TomAugspurger
Copy link
Contributor

Sorry about the extra commits @Mofef. I fixed a merge conflict in the release notes and the CI services apparently had an issue with that. They're running again now.

@Mofef
Copy link
Contributor Author

Mofef commented Jan 5, 2018

Oh ok no problem. thank you :)

@jreback jreback merged commit e6ea00c into pandas-dev:master Jan 6, 2018
@jreback
Copy link
Contributor

jreback commented Jan 6, 2018

thanks @Mofef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in set_labels of MultiIndex
4 participants