-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Preserve Alignment Between Index and Values for Non-Monotonic Stack #20980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @WillAyd! Thanks for updating the PR.
Comment last updated on May 14, 2018 at 21:28 Hours UTC |
@@ -653,7 +653,13 @@ def _convert_level_number(level_num, columns): | |||
# time to ravel the values | |||
new_data = {} | |||
level_vals = this.columns.levels[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a sort_monotonic function in a MI to do this
Good to know, though I must have explained the issue incorrectly. Within the call to MultiIndex(levels=[['A', 'B'], ['a', 'b', 'c', 'd']],
labels=[[0, 0, 0, 0, 1, 1, 1, 1], [2, 1, 0, 3, 2, 1, 0, 3]],
names=['dim2', 'foo']) I built another DataFrame manually which was equivalent (at least according to MultiIndex(levels=[['A', 'B'], ['c', 'b', 'a', 'd']],
labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]],
names=['dim2', 'foo']) The former yielded incorrect results at the end of the stack operation but the latter was fine, even though they were coming from two frames that look exactly the same. I believe the problem is that when iterating over the groups, Is the fact that the values of the DataFrame do not align with the labels of the column index by design? |
eg.
|
solved via #21043 |
git diff upstream/master -u -- "*.py" | flake8 --diff
Not overly familiar with this code so submitting for review as there's probably a better way of going about it. The root cause of the referenced issue IIUC is that the index labels of the caller are non-monotonic.
stack
essentially takes values and labels from the level that is getting pushed down into the rows with an implicit assumption that both are monotonic, hence the index/values get misaligned.This breaks at least one other test so not ready to merge, but looking for feedback on: