Skip to content

sort_index(..., ascending=False, ...) overrides sort_remaining behavior with multiindex #24247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alkasm opened this issue Dec 12, 2018 · 2 comments · Fixed by #53076
Closed
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@alkasm
Copy link

alkasm commented Dec 12, 2018

Script to reproduce

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(6))
df['index_1'] = [3, 3, 2, 2, 1, 1]
df['index_2'] = [2, 1, 2, 1, 2, 1]
df = df.set_index(['index_1', 'index_2'])

print(df)
print(df.sort_index(level='index_1', sort_remaining=False))
print(df.sort_index(level='index_1', ascending=False, sort_remaining=False))

iPython example output

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.rand(6))

In [4]: df['index_1'] = [3, 3, 2, 2, 1, 1]

In [5]: df['index_2'] = [2, 1, 2, 1, 2, 1]

In [6]: df = df.set_index(['index_1', 'index_2'])

In [7]: df
Out[7]:
                        0
index_1 index_2
3       2        0.558019
        1        0.096064
2       2        0.353176
        1        0.153776
1       2        0.812181
        1        0.313342

In [8]: df.sort_index(level='index_1', sort_remaining=False)
Out[8]:
                        0
index_1 index_2
1       2        0.812181
        1        0.313342
2       2        0.353176
        1        0.153776
3       2        0.558019
        1        0.096064

In [9]: df.sort_index(level='index_1', ascending=False, sort_remaining=False)
Out[9]:
                        0
index_1 index_2
3       1        0.096064
        2        0.558019
2       1        0.153776
        2        0.353176
1       1        0.313342
        2        0.812181

Problem description

Documentation for sort_index(): https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html

The sort_index() method will sort the dataframe by the index values. If the dataframe has a multiindex, then it sorts by the indexes in order. However, if you only want to sort by a subset of the indexes and leave the others in their current order, you can use the level keyword argument to specify a level (or list of levels) to sort on, and set sort_remaining=False to ignore the other levels. The sort_remaining keyword seems to work correctly when ascending=True (default), but when passing ascending=False and sort_remaining=False, Pandas continues to sort the other indices (in fact in ascending order, too).

Note that this happens regardless if you specify levels in any of the supported formats; viz.

df.sort_index(level=0, ascending=False, sort_remaining=False)
df.sort_index(level=[0], ascending=False, sort_remaining=False)
df.sort_index(level='index_1', ascending=False, sort_remaining=False)
df.sort_index(level=['index_1'], ascending=False, sort_remaining=False)

There have been a few issues/PRs on the sort_index() method in the past year, so I'm not sure if one of those PRs broke this ability or whether this has been around longer or is unrelated.

Expected Output

In [9]: df.sort_index(level='index_1', ascending=False, sort_remaining=False)
Out[9]:
                        0
index_1 index_2
3       2        0.558019
        1        0.096064
2       2        0.353176
        1        0.153776
1       2        0.812181
        1        0.313342

Note that the expected outcome could be achieved in this example with

df.reset_index('index_2').sort_index(ascending=False).reset_index().set_index(['index_1', 'index_2'])

or similar setting/resetting of indexes. Well, or you could just do nothing since the dataframe is already in that order, but you get the point.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 18.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 36.4.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.7.4
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@mroeschke
Copy link
Member

This looks fixed on main. Could use a test

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug MultiIndex labels Apr 28, 2023
@NoyHanan
Copy link
Contributor

NoyHanan commented May 4, 2023

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants