Skip to content

DataFrame.sort_index by Level Name Incorrect After Unstack/Swaplevel #20994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue May 9, 2018 · 0 comments · Fixed by #21043
Closed

DataFrame.sort_index by Level Name Incorrect After Unstack/Swaplevel #20994

WillAyd opened this issue May 9, 2018 · 0 comments · Fixed by #21043
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@WillAyd
Copy link
Member

WillAyd commented May 9, 2018

This is a very obscure issue but I think it is responsible for things getting out of whack in #20945

In [1]: mi = pd.MultiIndex.from_product([[0], ['d', 'c']], names=['bar', 'baz'])
In [2]: df = pd.DataFrame([[0, 2], [1, 3]], index=mi, columns=['B', 'A'])
In [3]: df.columns.name = 'foo'

In [4]: df
Out[4]: 
foo      B  A
bar baz      
0   d    0  2
    c    1  3

In [5]: df.unstack().swaplevel(axis=1)
Out[5]: 
baz  c  d  c  d
foo  B  B  A  A
bar            
0    1  0  3  2

In [6]: df.unstack().swaplevel(axis=1).sort_index(axis=1, level=0)
Out[6]: 
baz  c     d   
foo  A  B  A  B  # Here subsequent levels get sorted
bar            
0    3  1  2  0

In [7]: df.unstack().swaplevel(axis=1).sort_index(axis=1, level='baz')
Out[7]: 
baz  c     d   
foo  B  A  B  A  # Here subsequent levels aren't getting sorted
bar            
0    1  3  0  2

If the DataFrame in step 5 above was constructed directly, the sorting would be the same regardless of whether or not you used the level index or label:

In [1]: mi = pd.MultiIndex.from_tuples([('c', 'B'), ('d', 'B'), ('c', 'A'), ('d', 'A')], names=['baz', 'foo'])
In [2]: df = pd.DataFrame([[1, 0, 3, 2]], columns=mi, index=pd.Index([0], name='bar'))
In [3]: df  # Same as step 5 in above example
Out[3]: 
baz  c  d  c  d
foo  B  B  A  A
bar            
0    1  0  3  2

In [4]: df.sort_index(axis=1, level=0)
Out[4]: 
baz  c     d   
foo  A  B  A  B
bar            
0    3  1  2  0

In [5]: df.sort_index(axis=1, level='baz')
Out[5]: 
baz  c     d   
foo  A  B  A  B  # Sort is the same as item above, regardless of using label or not
bar            
0    3  1  2  0

Note that this only happened when doing the unstack and swaplevel together. My original thought was that the latter would be solely responsible, but I could not reproduce the issue using just that alone, so I'm assuming the former is mutating some kind of state of the MultiIndex?

INSTALLED VERSIONS

commit: eff1faf
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0rc2+27.geff1faf27
pytest: 3.4.1
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.0
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.5
feather: None
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 17, 2018
@jreback jreback added this to the 0.24.0 milestone May 17, 2018
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.24.0, 0.23.1 Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants