sort_index with sort_remaining=False does not work with int level #21052

kdebrab · 2018-05-15T13:30:50Z

Code Sample, a copy-pastable example if possible

import pandas as pd
mi = pd.MultiIndex.from_tuples([[2,2],[2,1],[1,2],[1,1]], names=list('AB'))
a = pd.Series([1,2,3,4], index=mi)
a.sort_index(level=0, sort_remaining=False)

returns:

A  B
1  1    4
   2    3
2  1    2
   2    1
dtype: int64

Problem description

Besides A (level 0), also B (level 1) is sorted. This, despite the fact that sort_remaining is set to False.

Expected Output

By comparison, the correct result is obtained when specifying the level by its name:

a.sort_index(level='A', sort_remaining=False)

with

A  B
1  2    3
   1    4
2  2    1
   1    2
dtype: int64

Exactly the same result should be expected when using level=0 instead of level='A'.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 32
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 36.4.0
Cython: None
numpy: 1.14.1
scipy: 1.0.0
pyarrow: None
xarray: 0.10.0
IPython: 5.3.0
sphinx: 1.6.3
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.2.3
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-05-15T15:25:53Z

Thanks for the bug. This is probably related to #21043, though the work done there does not appear to solve it.

Interestingly enough, the difference in sorting is ONLY observed at the first level. If your items were say at the second level there would be no difference.

In [2]: mi = pd.MultiIndex.from_tuples([[0,2,2],[0,2,1],[0,1,2],[0,1,1]], names=list('CAB'))
In [3]: a = pd.Series([1,2,3,4], index=mi)

In [5]: a.sort_index(level=1, sort_remaining=False)
Out[5]: 
C  A  B
0  1  2    3
      1    4
   2  2    1
      1    2
dtype: int64

In [6]: a.sort_index(level='A', sort_remaining=False)
Out[6]: 
C  A  B
0  1  2    3
      1    4
   2  2    1
      1    2
dtype: int64

kdebrab · 2018-05-15T16:51:07Z

Thus, the problem occurs only when level=0. Hmm, that sounds familiar. Indeed, a simple a.sort_index?? already reveals the culprit:

if level:
   [..]

In other words, level=0 is handled in the same way as level=None, which means that all levels are sorted...

The solution seems simple:

if level is not None:
   [..]

WillAyd · 2018-05-15T16:53:20Z

Perhaps - that's the same issue I found in the referenced PR #21043 though trying on that branch didn't resolve your issue. May be a similar issue somewhere else - investigation and PRs welcome!

kdebrab · 2018-05-15T17:05:40Z

I should have checked your link. I think you solved it for DataFrames, but not yet for Series...

kdebrab · 2018-05-16T07:57:16Z

Thanks for the quick response and fix @WillAyd !

WillAyd mentioned this issue May 15, 2018

Fix Inconsistent MultiIndex Sorting #21043

Merged

3 tasks

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 17, 2018

jreback added this to the 0.24.0 milestone May 17, 2018

jreback closed this as completed in #21043 May 19, 2018

jorisvandenbossche modified the milestones: 0.24.0, 0.23.1 Jun 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sort_index with sort_remaining=False does not work with int level #21052

sort_index with sort_remaining=False does not work with int level #21052

kdebrab commented May 15, 2018

INSTALLED VERSIONS

WillAyd commented May 15, 2018

kdebrab commented May 15, 2018

WillAyd commented May 15, 2018

kdebrab commented May 15, 2018

kdebrab commented May 16, 2018

sort_index with sort_remaining=False does not work with int level #21052

sort_index with sort_remaining=False does not work with int level #21052

Comments

kdebrab commented May 15, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented May 15, 2018

kdebrab commented May 15, 2018

WillAyd commented May 15, 2018

kdebrab commented May 15, 2018

kdebrab commented May 16, 2018

Output of `pd.show_versions()`