-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Issue 28518 multiindex interesection #28735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 28518 multiindex interesection #28735
Conversation
* test_lexsort_depth verify that lexsort_depth return the correct depth when sortorder is passed to the MultiIndex constructor * test_raise_invalid_sortorder test that the MultiIndex constructor raise when passing an incorrect sortorder * test_merge_multiindex_columns test the original issue
This fix issue pandas-dev#28518, where the label of the merge index where invalid due to inconsistent lexsort_depth property of the intersection of the indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to give this another pass but some quick comments in the meantime
pandas/core/indexes/multi.py
Outdated
return self.nlevels | ||
else: | ||
return 0 | ||
return int(self.sortorder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for int(...)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the line from the constructor, line 278, but if it is already cast in the constructor (maybe to handle passing False for sortorder), it may no be necessary here. I look if it passes the test without it.
for n in numbers: | ||
index_tuples.append([l, n]) | ||
|
||
index = pd.MultiIndex.from_tuples(index_tuples, names=["outer", "inner"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not just use from_product
here instead to replace the two loops above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes cleary.
r_suf = "_y" | ||
expected_labels = sum(([l + l_suf, l + r_suf] for l in letters), []) | ||
merged_frame = frame_x.merge(frame_y, on="id", suffixes=((l_suf, r_suf))).columns | ||
for label in expected_labels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of asserting within a loop would be easier if you built a result
and expected
frame and used tm.assert_frame_equal
to compare. You will find examples of that in other tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did'nt think about this way. It does make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. can you add a whatsnew note in 1.0, in the Reshaping section (also a note about validing sortorder in the MultiIndex bug fix section).
side note, do you see an actual usecase for sortorder in the constructor? I think we should deprecate this. (can you create an issue)
e89ac18
to
716820e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small request, otherwise ping on green.
pandas/core/indexes/multi.py
Outdated
|
||
def _lexsort_depth(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right was wanting a
def _lexsort_depth(self) -> int:
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sorry i did'nt get the "hint".
Joke aside, is it related to the caching of the result that the function needs typing?
@jreback Changes are made and checks are green. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, minor doc comment, ping on green.
@jreback Green and ready for takeoff |
thanks @nrebena very nice! |
MultiIndex lexsort_depth now return sortorder if sortorder is not set to None
At first the error come from the intersection of the two indexes, that resulted in an unsorted index with an invalid lexsort_depth value. Then when looking at the lexsort_depth function I found the deeper error, where the returned lexsort_depth value were the number of level instead of the self.sortorder value.
After fixing that I implemented a check in the MultiIndex._verify_integrity function when a sortorder is given to raise if it no coherent with the lexsort_depth.
I added tests for the issue, for the lexsort_depth function, and fo raising error in the _verify_integrity function.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff