Skip to content

Issue 28518 multiindex interesection #28735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nrebena
Copy link
Contributor

@nrebena nrebena commented Oct 1, 2019

MultiIndex lexsort_depth now return sortorder if sortorder is not set to None

At first the error come from the intersection of the two indexes, that resulted in an unsorted index with an invalid lexsort_depth value. Then when looking at the lexsort_depth function I found the deeper error, where the returned lexsort_depth value were the number of level instead of the self.sortorder value.

After fixing that I implemented a check in the MultiIndex._verify_integrity function when a sortorder is given to raise if it no coherent with the lexsort_depth.

I added tests for the issue, for the lexsort_depth function, and fo raising error in the _verify_integrity function.

* test_lexsort_depth verify that lexsort_depth return the correct depth
when sortorder is passed to the MultiIndex constructor
* test_raise_invalid_sortorder test that the MultiIndex constructor
raise when passing an incorrect sortorder
* test_merge_multiindex_columns test the original issue
This fix issue pandas-dev#28518, where the label of the merge index where invalid
due to inconsistent lexsort_depth property of the intersection of the
indexes
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to give this another pass but some quick comments in the meantime

return self.nlevels
else:
return 0
return int(self.sortorder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for int(...) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the line from the constructor, line 278, but if it is already cast in the constructor (maybe to handle passing False for sortorder), it may no be necessary here. I look if it passes the test without it.

for n in numbers:
index_tuples.append([l, n])

index = pd.MultiIndex.from_tuples(index_tuples, names=["outer", "inner"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not just use from_product here instead to replace the two loops above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes cleary.

r_suf = "_y"
expected_labels = sum(([l + l_suf, l + r_suf] for l in letters), [])
merged_frame = frame_x.merge(frame_y, on="id", suffixes=((l_suf, r_suf))).columns
for label in expected_labels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of asserting within a loop would be easier if you built a result and expected frame and used tm.assert_frame_equal to compare. You will find examples of that in other tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did'nt think about this way. It does make more sense.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 2, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. can you add a whatsnew note in 1.0, in the Reshaping section (also a note about validing sortorder in the MultiIndex bug fix section).

side note, do you see an actual usecase for sortorder in the constructor? I think we should deprecate this. (can you create an issue)

@jreback jreback added this to the 1.0 milestone Oct 2, 2019
@nrebena nrebena force-pushed the issue_28518_multiindex_interesection branch from e89ac18 to 716820e Compare October 2, 2019 14:21
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small request, otherwise ping on green.


def _lexsort_depth(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right was wanting a
def _lexsort_depth(self) -> int: here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sorry i did'nt get the "hint".
Joke aside, is it related to the caching of the result that the function needs typing?

@nrebena
Copy link
Contributor Author

nrebena commented Oct 2, 2019

@jreback Changes are made and checks are green. 🙂

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor doc comment, ping on green.

@nrebena
Copy link
Contributor Author

nrebena commented Oct 3, 2019

@jreback Green and ready for takeoff

@jreback jreback merged commit a8538f2 into pandas-dev:master Oct 5, 2019
@jreback
Copy link
Contributor

jreback commented Oct 5, 2019

thanks @nrebena very nice!

@nrebena nrebena deleted the issue_28518_multiindex_interesection branch October 6, 2019 09:06
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merge fails to add suffixes on multiindex columns
3 participants