Skip to content

Subsetting dataframe w/ Multiindex does not preserve queried order #16038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DSLituiev opened this issue Apr 18, 2017 · 1 comment
Closed

Subsetting dataframe w/ Multiindex does not preserve queried order #16038

DSLituiev opened this issue Apr 18, 2017 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request

Comments

@DSLituiev
Copy link

Problem description

I would like to subset the data frame with a indexer. When using MultiIndex, the results are not following requested order.

Code Sample, a copy-pastable example if possible

This works as expected when index is a simple Index: the returned entries are ordered as requested:

df = pd.DataFrame(
    {"i1":[1,1,1,1,2,4,4,2,3,3,3,3],
     "i2":[1,3,2,2,1,1,2,2,1,1,3,2],
     "d1":['a','b','c','d','e','f','g','h','i','j','k','l']}
)
df.set_index(['d1',], inplace=True)
df.loc[["a","c","b"],:]

returns

#     i1  i2
# d1        
# a    1   1
# c    1   2 <-- 'c' precedes 'b' as requested in query
# b    1   3

However, when querying a data frame with a MultiIndex the behaviour is inconsistent:

df = pd.DataFrame(
    {"i1":[1,1,1,1,2,4,4,2,3,3,3,3],
     "i2":[1,3,2,2,1,1,2,2,1,1,3,2],
     "d1":['a','b','c','d','e','f','g','h','i','j','k','l']}
)
df.set_index(['d1',"i1"], inplace=True)
df.loc[["a","c","b"],:]

returns

#        i2
# d1 i1    
# a  1    1
# b  1    3 
# c  1    2  <-- 'c' follows 'b' unlike in query

Expected Output

#        i2
# d1 i1    
# a  1    1
# c  1    2
# b  1    3 

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.1.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.4.1
Cython: 0.24
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.6.1
xarray: None
IPython: 5.3.0
sphinx: 1.4.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: 1.5.1
sqlalchemy: 1.0.14
pymysql: None
psycopg2: None
jinja2: 2.9.6
boto: 2.42.0
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

@DSLituiev Thanks for the report
Yes, this is a known issue at the moment, closing this as a duplicate of #10710.
Pull requests welcome!

@jorisvandenbossche jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Apr 18, 2017
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Apr 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants