BUG: DataFrame.unstack() does not properly sort list of levels #9514

seth-p · 2015-02-18T22:06:57Z

In 0.15.2 (and I believe this remains the case), the docstring for DataFrame.unstack() states The level involved will automatically get sorted.. This is not necessarily the case when level is a list of levels.

In [40]: df = pd.DataFrame(np.arange(8).reshape((4, 2)),
                           index=pd.MultiIndex.from_tuples([(100, 'A', 'y'), (100, 'A', 'x'),
                                                            (100, 'B', 'x'), (200, 'B', 'y')],
                                                           names=['Nums', 'Upper','Lower']))

In [41]: df
Out[41]:
                  0  1
Nums Upper Lower
100  A     y      0  1
           x      2  3
     B     x      4  5
200  B     y      6  7

In [42]: df.unstack([1, 2])
Out[42]:
        0               1
Upper   A       B       A       B
Lower   y   x   x   y   y   x   x   y
Nums
100     0   2   4 NaN   1   3   5 NaN
200   NaN NaN NaN   6 NaN NaN NaN   7

Note that the pivoted tuples are ordered as [(A, y), (A, x), (B, x), (B, y)], which is not sorted.
I would expect the result to be the same as the following:

In [43]: df.T.stack([1, 2]).T
Out[43]:
        0               1
Upper   A       B       A       B
Lower   x   y   x   y   x   y   x   y
Nums
100     2   0   4 NaN   3   1   5 NaN
200   NaN NaN NaN   6 NaN NaN NaN   7

In fact, there seems to be a problem even when level is a list containing just a single level. Compare the following:

In [47]: df.unstack(2)
Out[47]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7

In [48]: df.unstack([2])
Out[48]:
             0       1
Lower        y   x   y   x
Nums Upper
100  A       0   2   1   3
     B     NaN   4 NaN   5
200  B       6 NaN   7 NaN

In [50]: df.T.stack(2).T
Out[50]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7

In [51]: df.T.stack([2]).T
Out[51]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7

The text was updated successfully, but these errors were encountered:

seth-p · 2015-02-18T22:09:09Z

cc: @behzadnouri

seth-p · 2015-02-19T02:03:54Z

As an aside, DataFrame.stack() also claims The level involved will automatically get sorted.. In 0.15.2, it seems to do so for a MultiIndex, but not for a simple Index. All rather inconsistent.

In [61]: df1 = pd.DataFrame(np.arange(6).reshape(2, 3), columns=['B', 'A', 'C'])

In [62]: df1
Out[62]:
   B  A  C
0  0  1  2
1  3  4  5

In [63]: df1.stack()  # the stacked level is *NOT* sorted
Out[63]:
0  B    0
   A    1
   C    2
1  B    3
   A    4
   C    5
dtype: int32

In [68]: df2 = pd.DataFrame(np.arange(6).reshape(2, 3), columns=pd.MultiIndex.from_tuples([('B', 'b'), ('A', 'a'), ('C', 'c')], names=['Upper', 'Lower']))

In [69]: df2
Out[69]:
Upper  B  A  C
Lower  b  a  c
0      0  1  2
1      3  4  5

In [70]: df2.stack(level=1)  # the stacked level 'Lower' and the remaining level 'Upper' are sorted
Out[70]:
Upper     A   B   C
  Lower
0 a       1 NaN NaN
  b     NaN   0 NaN
  c     NaN NaN   2
1 a       4 NaN NaN
  b     NaN   3 NaN
  c     NaN NaN   5

In [71]: df2.stack(level=0)  # the stacked level 'Upper' and the remaining level 'Lower' are sorted
Out[71]:
Lower     a   b   c
  Upper
0 A       1 NaN NaN
  B     NaN   0 NaN
  C     NaN NaN   2
1 A       4 NaN NaN
  B     NaN   3 NaN
  C     NaN NaN   5

In [73]: df2.stack([0, 1])  # the stacked levels are sorted
Out[73]:
   Upper  Lower
0  A      a        1
   B      b        0
   C      c        2
1  A      a        4
   B      b        3
   C      c        5
dtype: float64

shoyer · 2015-02-23T01:21:02Z

Agreed, this looks inconsistent to me.

seth-p · 2015-05-09T18:07:23Z

Any thoughts on what the behavior should be? At the very least, I think:

Whatever the behavior is, it should be consistent for df.unstack(level) and df.T.stack(level).T.
In the case of df.unstack(level), we need to specify whether each of the following is to be sorted:
2.a. The original df.columns, into which the specified level of df.index will be unstacked. It should not depend on what type of Index df.columns is.
2.b. The specified level(s) in df.index that are being unstacked into df.columns. It should not depend on how many levels are being unstacked; or in the case all levels are being unstacked, on what type of Index df.index is.
2.c. The unspecified levels in df.index (if any) that remain as df.unstack(level).index.

max-sixty · 2015-12-07T19:17:53Z

I just hit this, +1

simonjayhawkins · 2020-07-19T18:56:43Z

NOTE: the latest docs no longer state 'The level involved will automatically get sorted..' https://pandas.pydata.org/pandas-docs/version/1.1.0/reference/api/pandas.DataFrame.unstack.html

I would expect the result to be the same as the following:

In [43]: df.T.stack([1, 2]).T
Out[43]:
        0               1
Upper   A       B       A       B
Lower   x   y   x   y   x   y   x   y
Nums
100     2   0   4 NaN   3   1   5 NaN
200   NaN NaN NaN   6 NaN NaN NaN   7

the ordering of the index levels should maybe be preserved (effectively unstacking each row in turn) in which case the expected output would be

         0    1    0    1    0    1    0    1
Upper    A    A    A    A    B    B    B    B
Lower    y    y    x    x    x    x    y    y
Nums                                         
100    0.0  1.0  2.0  3.0  4.0  5.0  NaN  NaN
200    NaN  NaN  NaN  NaN  NaN  NaN  6.0  7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.unstack() does not properly sort list of levels #9514

BUG: DataFrame.unstack() does not properly sort list of levels #9514

seth-p commented Feb 18, 2015

seth-p commented Feb 18, 2015

seth-p commented Feb 19, 2015

shoyer commented Feb 23, 2015

seth-p commented May 9, 2015

max-sixty commented Dec 7, 2015

simonjayhawkins commented Jul 19, 2020

mdruiter commented Oct 11, 2023

BUG: DataFrame.unstack() does not properly sort list of levels #9514

BUG: DataFrame.unstack() does not properly sort list of levels #9514

Comments

seth-p commented Feb 18, 2015

seth-p commented Feb 18, 2015

seth-p commented Feb 19, 2015

shoyer commented Feb 23, 2015

seth-p commented May 9, 2015

max-sixty commented Dec 7, 2015

simonjayhawkins commented Jul 19, 2020

mdruiter commented Oct 11, 2023