Skip to content

Partial indexing does not respect key order/repetitions #18345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Nov 17, 2017 · 2 comments
Closed

Partial indexing does not respect key order/repetitions #18345

toobaz opened this issue Nov 17, 2017 · 2 comments

Comments

@toobaz
Copy link
Member

toobaz commented Nov 17, 2017

Code Sample, a copy-pastable example if possible

In [2]: keys = ['c', 'a', 'b', 'c']

In [3]: tm = pd.DataFrame(index=pd.MultiIndex.from_tuples([('b', 1), ('b', 2), ('a', 1), ('c', 1)]),
   ...:                   columns=range(3))

In [4]: tm.loc[keys] # bad
Out[4]: 
       0    1    2
b 1  NaN  NaN  NaN
  2  NaN  NaN  NaN
a 1  NaN  NaN  NaN
c 1  NaN  NaN  NaN

In [5]: tm.transpose().loc[:, keys].transpose() # bad
Out[5]: 
       0    1    2
b 1  NaN  NaN  NaN
  2  NaN  NaN  NaN
a 1  NaN  NaN  NaN
c 1  NaN  NaN  NaN

In [6]: tm.loc[[(k, 1) for k in keys]] # complete indexing: good
Out[6]: 
       0    1    2
c 1  NaN  NaN  NaN
a 1  NaN  NaN  NaN
b 1  NaN  NaN  NaN
c 1  NaN  NaN  NaN

In [7]: tf = pd.DataFrame(index=['b', 'a', 'c'],
   ...:                  columns=range(3))

In [8]: tf.loc[keys] # flat index: good
Out[8]: 
     0    1    2
c  NaN  NaN  NaN
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c  NaN  NaN  NaN

Problem description

Partial indexing of a MultiIndex with a list discards order/repetitions.

Expected Output

In [9]: pd.concat([tm.loc[[k]] for k in keys])
Out[9]: 
       0    1    2
c 1  NaN  NaN  NaN
a 1  NaN  NaN  NaN
b 1  NaN  NaN  NaN
  2  NaN  NaN  NaN
c 1  NaN  NaN  NaN

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+135.g9c799e2c4
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@gfyoung
Copy link
Member

gfyoung commented Nov 20, 2017

Duplicates handling has always been a troubling point for us. Feel free to investigate and patch!

@jorisvandenbossche
Copy link
Member

This is a duplicate of #10710 (although you have more examples here), and indeed a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants