Partial indexing does not respect key order/repetitions #18345

toobaz · 2017-11-17T14:16:26Z

Code Sample, a copy-pastable example if possible

In [2]: keys = ['c', 'a', 'b', 'c']

In [3]: tm = pd.DataFrame(index=pd.MultiIndex.from_tuples([('b', 1), ('b', 2), ('a', 1), ('c', 1)]),
   ...:                   columns=range(3))

In [4]: tm.loc[keys] # bad
Out[4]: 
       0    1    2
b 1  NaN  NaN  NaN
  2  NaN  NaN  NaN
a 1  NaN  NaN  NaN
c 1  NaN  NaN  NaN

In [5]: tm.transpose().loc[:, keys].transpose() # bad
Out[5]: 
       0    1    2
b 1  NaN  NaN  NaN
  2  NaN  NaN  NaN
a 1  NaN  NaN  NaN
c 1  NaN  NaN  NaN

In [6]: tm.loc[[(k, 1) for k in keys]] # complete indexing: good
Out[6]: 
       0    1    2
c 1  NaN  NaN  NaN
a 1  NaN  NaN  NaN
b 1  NaN  NaN  NaN
c 1  NaN  NaN  NaN

In [7]: tf = pd.DataFrame(index=['b', 'a', 'c'],
   ...:                  columns=range(3))

In [8]: tf.loc[keys] # flat index: good
Out[8]: 
     0    1    2
c  NaN  NaN  NaN
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c  NaN  NaN  NaN

Problem description

Partial indexing of a MultiIndex with a list discards order/repetitions.

Expected Output

In [9]: pd.concat([tm.loc[[k]] for k in keys])
Out[9]: 
       0    1    2
c 1  NaN  NaN  NaN
a 1  NaN  NaN  NaN
b 1  NaN  NaN  NaN
  2  NaN  NaN  NaN
c 1  NaN  NaN  NaN

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+135.g9c799e2c4
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

gfyoung · 2017-11-20T18:38:49Z

Duplicates handling has always been a troubling point for us. Feel free to investigate and patch!

jorisvandenbossche · 2017-11-20T19:39:33Z

This is a duplicate of #10710 (although you have more examples here), and indeed a problem.

gfyoung added MultiIndex Bug labels Nov 20, 2017

jorisvandenbossche closed this as completed Nov 20, 2017

jorisvandenbossche added this to the No action milestone Nov 20, 2017

toobaz mentioned this issue Jul 2, 2019

Code for list of partial indexers should be in MultiIndex.get_indexer #27181

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial indexing does not respect key order/repetitions #18345

Partial indexing does not respect key order/repetitions #18345

toobaz commented Nov 17, 2017

INSTALLED VERSIONS

gfyoung commented Nov 20, 2017

jorisvandenbossche commented Nov 20, 2017

Partial indexing does not respect key order/repetitions #18345

Partial indexing does not respect key order/repetitions #18345

Comments

toobaz commented Nov 17, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

gfyoung commented Nov 20, 2017

jorisvandenbossche commented Nov 20, 2017

Output of `pd.show_versions()`