sort_index(..., ascending=False, ...) overrides sort_remaining behavior with multiindex #24247

alkasm · 2018-12-12T10:26:58Z

Script to reproduce

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(6))
df['index_1'] = [3, 3, 2, 2, 1, 1]
df['index_2'] = [2, 1, 2, 1, 2, 1]
df = df.set_index(['index_1', 'index_2'])

print(df)
print(df.sort_index(level='index_1', sort_remaining=False))
print(df.sort_index(level='index_1', ascending=False, sort_remaining=False))

iPython example output

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.rand(6))

In [4]: df['index_1'] = [3, 3, 2, 2, 1, 1]

In [5]: df['index_2'] = [2, 1, 2, 1, 2, 1]

In [6]: df = df.set_index(['index_1', 'index_2'])

In [7]: df
Out[7]:
                        0
index_1 index_2
3       2        0.558019
        1        0.096064
2       2        0.353176
        1        0.153776
1       2        0.812181
        1        0.313342

In [8]: df.sort_index(level='index_1', sort_remaining=False)
Out[8]:
                        0
index_1 index_2
1       2        0.812181
        1        0.313342
2       2        0.353176
        1        0.153776
3       2        0.558019
        1        0.096064

In [9]: df.sort_index(level='index_1', ascending=False, sort_remaining=False)
Out[9]:
                        0
index_1 index_2
3       1        0.096064
        2        0.558019
2       1        0.153776
        2        0.353176
1       1        0.313342
        2        0.812181

Problem description

Documentation for sort_index(): https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html

The sort_index() method will sort the dataframe by the index values. If the dataframe has a multiindex, then it sorts by the indexes in order. However, if you only want to sort by a subset of the indexes and leave the others in their current order, you can use the level keyword argument to specify a level (or list of levels) to sort on, and set sort_remaining=False to ignore the other levels. The sort_remaining keyword seems to work correctly when ascending=True (default), but when passing ascending=False and sort_remaining=False, Pandas continues to sort the other indices (in fact in ascending order, too).

Note that this happens regardless if you specify levels in any of the supported formats; viz.

df.sort_index(level=0, ascending=False, sort_remaining=False)
df.sort_index(level=[0], ascending=False, sort_remaining=False)
df.sort_index(level='index_1', ascending=False, sort_remaining=False)
df.sort_index(level=['index_1'], ascending=False, sort_remaining=False)

There have been a few issues/PRs on the sort_index() method in the past year, so I'm not sure if one of those PRs broke this ability or whether this has been around longer or is unrelated.

Expected Output

In [9]: df.sort_index(level='index_1', ascending=False, sort_remaining=False)
Out[9]:
                        0
index_1 index_2
3       2        0.558019
        1        0.096064
2       2        0.353176
        1        0.153776
1       2        0.812181
        1        0.313342

Note that the expected outcome could be achieved in this example with

df.reset_index('index_2').sort_index(ascending=False).reset_index().set_index(['index_1', 'index_2'])

or similar setting/resetting of indexes. Well, or you could just do nothing since the dataframe is already in that order, but you get the point.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 18.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 36.4.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.7.4
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

mroeschke · 2023-04-28T23:04:47Z

This looks fixed on main. Could use a test

NoyHanan · 2023-05-04T12:14:10Z

take

mroeschke added Bug MultiIndex labels Jan 13, 2019

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug MultiIndex labels Apr 28, 2023

github-actions bot assigned NoyHanan May 4, 2023

NoyHanan mentioned this issue May 4, 2023

Added test for sort_index parameter multiindex 'sort_remaining' = False #53076

Merged

2 tasks

phofl closed this as completed in #53076 May 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sort_index(..., ascending=False, ...) overrides sort_remaining behavior with multiindex #24247

sort_index(..., ascending=False, ...) overrides sort_remaining behavior with multiindex #24247

alkasm commented Dec 12, 2018 •

edited

Loading

mroeschke commented Apr 28, 2023

NoyHanan commented May 4, 2023

sort_index(..., ascending=False, ...) overrides sort_remaining behavior with multiindex #24247

sort_index(..., ascending=False, ...) overrides sort_remaining behavior with multiindex #24247

Comments

alkasm commented Dec 12, 2018 • edited Loading

Script to reproduce

iPython example output

Problem description

Expected Output

Output of pd.show_versions()

mroeschke commented Apr 28, 2023

NoyHanan commented May 4, 2023

alkasm commented Dec 12, 2018 •

edited

Loading

Output of `pd.show_versions()`