DataFrame.stack() with flat columns won't sort #18356

toobaz · 2017-11-18T14:15:45Z

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame(1, index=range(3), columns=[1, 3, 2])

In [3]: df.stack()
Out[3]: 
0  1    1
   3    1
   2    1
1  1    1
   3    1
   2    1
2  1    1
   3    1
   2    1
dtype: int64

Problem description

The docs state "The level involved will automatically get sorted.", and this is indeed what happens if df.columns is a MultiIndex.

Related: #18310 (the opposite problem, when sorting shouldn't happen).

Expected Output

In [3]: df.stack()
Out[3]: 
0  1    1
   2    1
   3    1
1  1    1
   2    1
   3    1
2  1    1
   2    1
   3    1
dtype: int64

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: cfad581
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+151.gcfad581e9
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

jreback · 2017-11-19T16:37:29Z

In [1]: df = pd.DataFrame(1, index=range(3), columns=[1, 3, 2])

In [2]: df.stack()
Out[2]: 
0  1    1
   3    1
   2    1
1  1    1
   3    1
   2    1
2  1    1
   3    1
   2    1
dtype: int64

In [3]: df.stack().index.is_lexsorted()
Out[3]: True

In [5]: df.stack().index.is_monotonic
Out[5]: False

The docs should read lex-sorted. Not 'sorted'.

toobaz · 2017-11-19T16:43:01Z

The docs should read lex-sorted.

Still, sorting takes place when there is a MultiIndex:

In [2]: df = pd.DataFrame(1, index=range(3), columns=pd.MultiIndex.from_product([[1, 3, 2]]*2))

In [3]: df.stack()
Out[3]: 
     1  2  3
0 1  1  1  1
  2  1  1  1
  3  1  1  1
1 1  1  1  1
  2  1  1  1
  3  1  1  1
2 1  1  1  1
  2  1  1  1
  3  1  1  1

toobaz · 2017-11-19T16:45:43Z

(I recognize "lexsorted" would be correct, since in the new index the values were themselves sorted... still, it seems to me incoherent)

jreback added Docs MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 19, 2017

jreback added this to the Next Major Release milestone Nov 19, 2017

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.stack() with flat columns won't sort #18356

DataFrame.stack() with flat columns won't sort #18356

toobaz commented Nov 18, 2017

INSTALLED VERSIONS

jreback commented Nov 19, 2017

toobaz commented Nov 19, 2017

toobaz commented Nov 19, 2017

DataFrame.stack() with flat columns won't sort #18356

DataFrame.stack() with flat columns won't sort #18356

Comments

toobaz commented Nov 18, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Nov 19, 2017

toobaz commented Nov 19, 2017

toobaz commented Nov 19, 2017

Output of `pd.show_versions()`