Skip to content

BUG: DataFrame.unstack() does not properly sort list of levels #9514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
seth-p opened this issue Feb 18, 2015 · 7 comments
Open

BUG: DataFrame.unstack() does not properly sort list of levels #9514

seth-p opened this issue Feb 18, 2015 · 7 comments
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@seth-p
Copy link
Contributor

seth-p commented Feb 18, 2015

In 0.15.2 (and I believe this remains the case), the docstring for DataFrame.unstack() states The level involved will automatically get sorted.. This is not necessarily the case when level is a list of levels.

In [40]: df = pd.DataFrame(np.arange(8).reshape((4, 2)),
                           index=pd.MultiIndex.from_tuples([(100, 'A', 'y'), (100, 'A', 'x'),
                                                            (100, 'B', 'x'), (200, 'B', 'y')],
                                                           names=['Nums', 'Upper','Lower']))

In [41]: df
Out[41]:
                  0  1
Nums Upper Lower
100  A     y      0  1
           x      2  3
     B     x      4  5
200  B     y      6  7

In [42]: df.unstack([1, 2])
Out[42]:
        0               1
Upper   A       B       A       B
Lower   y   x   x   y   y   x   x   y
Nums
100     0   2   4 NaN   1   3   5 NaN
200   NaN NaN NaN   6 NaN NaN NaN   7

Note that the pivoted tuples are ordered as [(A, y), (A, x), (B, x), (B, y)], which is not sorted.
I would expect the result to be the same as the following:

In [43]: df.T.stack([1, 2]).T
Out[43]:
        0               1
Upper   A       B       A       B
Lower   x   y   x   y   x   y   x   y
Nums
100     2   0   4 NaN   3   1   5 NaN
200   NaN NaN NaN   6 NaN NaN NaN   7

In fact, there seems to be a problem even when level is a list containing just a single level. Compare the following:

In [47]: df.unstack(2)
Out[47]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7

In [48]: df.unstack([2])
Out[48]:
             0       1
Lower        y   x   y   x
Nums Upper
100  A       0   2   1   3
     B     NaN   4 NaN   5
200  B       6 NaN   7 NaN

In [50]: df.T.stack(2).T
Out[50]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7

In [51]: df.T.stack([2]).T
Out[51]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7
@seth-p
Copy link
Contributor Author

seth-p commented Feb 18, 2015

cc: @behzadnouri

@seth-p seth-p changed the title BUG: DataFrame.unstack() does not properly sort multiple pivoted levels BUG: DataFrame.unstack() does not properly sort list of levels Feb 19, 2015
@seth-p
Copy link
Contributor Author

seth-p commented Feb 19, 2015

As an aside, DataFrame.stack() also claims The level involved will automatically get sorted.. In 0.15.2, it seems to do so for a MultiIndex, but not for a simple Index. All rather inconsistent.

In [61]: df1 = pd.DataFrame(np.arange(6).reshape(2, 3), columns=['B', 'A', 'C'])

In [62]: df1
Out[62]:
   B  A  C
0  0  1  2
1  3  4  5

In [63]: df1.stack()  # the stacked level is *NOT* sorted
Out[63]:
0  B    0
   A    1
   C    2
1  B    3
   A    4
   C    5
dtype: int32

In [68]: df2 = pd.DataFrame(np.arange(6).reshape(2, 3), columns=pd.MultiIndex.from_tuples([('B', 'b'), ('A', 'a'), ('C', 'c')], names=['Upper', 'Lower']))

In [69]: df2
Out[69]:
Upper  B  A  C
Lower  b  a  c
0      0  1  2
1      3  4  5

In [70]: df2.stack(level=1)  # the stacked level 'Lower' and the remaining level 'Upper' are sorted
Out[70]:
Upper     A   B   C
  Lower
0 a       1 NaN NaN
  b     NaN   0 NaN
  c     NaN NaN   2
1 a       4 NaN NaN
  b     NaN   3 NaN
  c     NaN NaN   5

In [71]: df2.stack(level=0)  # the stacked level 'Upper' and the remaining level 'Lower' are sorted
Out[71]:
Lower     a   b   c
  Upper
0 A       1 NaN NaN
  B     NaN   0 NaN
  C     NaN NaN   2
1 A       4 NaN NaN
  B     NaN   3 NaN
  C     NaN NaN   5

In [73]: df2.stack([0, 1])  # the stacked levels are sorted
Out[73]:
   Upper  Lower
0  A      a        1
   B      b        0
   C      c        2
1  A      a        4
   B      b        3
   C      c        5
dtype: float64

@shoyer
Copy link
Member

shoyer commented Feb 23, 2015

Agreed, this looks inconsistent to me.

@seth-p
Copy link
Contributor Author

seth-p commented May 9, 2015

Any thoughts on what the behavior should be? At the very least, I think:

  1. Whatever the behavior is, it should be consistent for df.unstack(level) and df.T.stack(level).T.
  2. In the case of df.unstack(level), we need to specify whether each of the following is to be sorted:
    2.a. The original df.columns, into which the specified level of df.index will be unstacked. It should not depend on what type of Index df.columns is.
    2.b. The specified level(s) in df.index that are being unstacked into df.columns. It should not depend on how many levels are being unstacked; or in the case all levels are being unstacked, on what type of Index df.index is.
    2.c. The unspecified levels in df.index (if any) that remain as df.unstack(level).index.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 9, 2015
@jreback jreback added this to the Next Major Release milestone May 9, 2015
@max-sixty
Copy link
Contributor

I just hit this, +1

@simonjayhawkins
Copy link
Member

NOTE: the latest docs no longer state 'The level involved will automatically get sorted..' https://pandas.pydata.org/pandas-docs/version/1.1.0/reference/api/pandas.DataFrame.unstack.html

I would expect the result to be the same as the following:

In [43]: df.T.stack([1, 2]).T
Out[43]:
        0               1
Upper   A       B       A       B
Lower   x   y   x   y   x   y   x   y
Nums
100     2   0   4 NaN   3   1   5 NaN
200   NaN NaN NaN   6 NaN NaN NaN   7

the ordering of the index levels should maybe be preserved (effectively unstacking each row in turn) in which case the expected output would be

         0    1    0    1    0    1    0    1
Upper    A    A    A    A    B    B    B    B
Lower    y    y    x    x    x    x    y    y
Nums                                         
100    0.0  1.0  2.0  3.0  4.0  5.0  NaN  NaN
200    NaN  NaN  NaN  NaN  NaN  NaN  6.0  7.0

see also #15105

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@mdruiter
Copy link
Contributor

The documentation currently says:

sort : bool, default True
    Sort the level(s) in the resulting MultiIndex columns.

I guess this is just ambiguous enough to mean that the index's levels will be sorted in some way, but index.is_monotonic* is not guaranteed...
Should this issue be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

7 participants