Skip to content

Preserve Alignment Between Index and Values for Non-Monotonic Stack #20980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented May 8, 2018

Not overly familiar with this code so submitting for review as there's probably a better way of going about it. The root cause of the referenced issue IIUC is that the index labels of the caller are non-monotonic. stack essentially takes values and labels from the level that is getting pushed down into the rows with an implicit assumption that both are monotonic, hence the index/values get misaligned.

This breaks at least one other test so not ready to merge, but looking for feedback on:

  • If there's a better way to align the values with the labels in this function AND/OR
  • If we make any guarantees about the order of the labels for the level(s) being moved in this function

@pep8speaks
Copy link

pep8speaks commented May 8, 2018

Hello @WillAyd! Thanks for updating the PR.

Line 427:80: E501 line too long (81 > 79 characters)

Comment last updated on May 14, 2018 at 21:28 Hours UTC

@WillAyd WillAyd changed the title Stack order Preserve Alignment Between Index and Values for Non-Monotonic Stack May 8, 2018
@@ -653,7 +653,13 @@ def _convert_level_number(level_num, columns):
# time to ravel the values
new_data = {}
level_vals = this.columns.levels[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a sort_monotonic function in a MI to do this

@WillAyd
Copy link
Member Author

WillAyd commented May 8, 2018

Good to know, though I must have explained the issue incorrectly. Within the call to _stack_multi_columns on master here's how the column index of the failing example looks:

MultiIndex(levels=[['A', 'B'], ['a', 'b', 'c', 'd']],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [2, 1, 0, 3, 2, 1, 0, 3]],
           names=['dim2', 'foo'])

I built another DataFrame manually which was equivalent (at least according to tm.assert_frame_equal) and had a column index that look as follows:

MultiIndex(levels=[['A', 'B'], ['c', 'b', 'a', 'd']],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]],
           names=['dim2', 'foo'])

The former yielded incorrect results at the end of the stack operation but the latter was fine, even though they were coming from two frames that look exactly the same.

I believe the problem is that when iterating over the groups, _stack_multi_columns gets a slice of the frames' values. With the latter, getting a slice of the 4 values at a time keeps it aligned with the second level labels since they are sequential. That does not apply to the former example hence the issue.

Is the fact that the values of the DataFrame do not align with the labels of the column index by design?

@jreback
Copy link
Contributor

jreback commented May 9, 2018

eg.

In [2]: mi = pd.MultiIndex(levels=[['A', 'B'], ['c', 'b', 'a', 'd']],
   ...:            labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]],
   ...:            names=['dim2', 'foo'])
   ...:            

In [3]: mi._sort_levels_monotonic()
Out[3]: 
MultiIndex(levels=[['A', 'B'], ['a', 'b', 'c', 'd']],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [2, 1, 0, 3, 2, 1, 0, 3]],
           names=['dim2', 'foo'])

@WillAyd WillAyd closed this May 14, 2018
@WillAyd WillAyd deleted the stack-order branch May 14, 2018 21:11
@WillAyd WillAyd restored the stack-order branch May 14, 2018 21:28
@WillAyd WillAyd reopened this May 14, 2018
@WillAyd
Copy link
Member Author

WillAyd commented Jun 2, 2018

solved via #21043

@WillAyd WillAyd closed this Jun 2, 2018
@WillAyd WillAyd deleted the stack-order branch December 25, 2018 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data is mismatched with labels after stack with MultiIndex columns
3 participants