Skip to content

Incoherent behavior when ambiguously indexing MultiIndexed DataFrame with slice or list #16396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toobaz opened this issue May 20, 2017 · 3 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@toobaz
Copy link
Member

toobaz commented May 20, 2017

Sorry in advance if this is already discussed/reported - I searched in the archive, but didn't know exactly what to search.

Code Sample, a copy-pastable example if possible

In [2]: tt = pd.DataFrame([[1, 2, 'v1', 'v2'], [3, 4, 'v3','v4']],
   ...:                   columns=['idx1', 'idx2', 2, 6]).set_index(['idx1', 'idx2'])

In [3]: tt
Out[3]: 
            2   6
idx1 idx2        
1    2     v1  v2
3    4     v3  v4

In [4]: tt.loc[1,2]
Out[4]: 
2    v1
6    v2
Name: (1, 2), dtype: object

In [5]: tt.loc[:1,2]
Out[5]: 
idx1  idx2
1     2       v1
Name: 2, dtype: object

In [6]: tt.loc[:,2]
Out[6]: 
idx1  idx2
1     2       v1
3     4       v3
Name: 2, dtype: object

Problem description

.loc[l1, l2] called on a MultiIndexed DataFrame is ambiguous: l2 could refer to the second level of the index, or to the columns. Apparently, the decision has been taken to follow the first interpretation, and it is fine. But then, the same must happen when l1 and l2 are slices.

I can understand that In [6] might "look different" from In [4]: but In [4] and In [5] should really give the same result (and hence In [6] too).

Expected Output

Out [4] in all three cases (or Out [6] if we prefer to favour the second interpretation - which however would probably be more disruptive).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.7.0-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.20.1
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: 0.9.2
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.2
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.7.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.2.1

@toobaz
Copy link
Member Author

toobaz commented Dec 4, 2017

The problem is not limited to slices:

In [2]: df = pd.DataFrame(index=pd.MultiIndex.from_product([[1,2], [3, 4], [5, 6]]), columns=['a', 'b'])

In [3]: df.loc[1, 3] # good
Out[3]: 
     a    b
5  NaN  NaN
6  NaN  NaN

In [4]: df.loc[1, [3,4]]
---------------------------------------------------------------------------
[...]
KeyError: 'None of [[3, 4]] are in the [columns]'

In [5]: df.loc[[1,2], [3]]
---------------------------------------------------------------------------
[...]
KeyError: 'None of [[3]] are in the [columns]'

In [6]: df.loc[[1,2], 3]
---------------------------------------------------------------------------
[...]
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [3] of <class 'int'>

Retitling accordingly

@toobaz toobaz changed the title Incoherent behavior when ambiguously indexing MultiIndexed DataFrame with slice Incoherent behavior when ambiguously indexing MultiIndexed DataFrame with slice or list Dec 4, 2017
@jreback
Copy link
Contributor

jreback commented Feb 12, 2018

when only scalars are passed this is ambiguous because of __getitem__ coercion, but slices are not coerced to like scalars, so it is unambiguous. To be unambiguous, a fully indexed row, column indexer (even if empty) must be passed. This is already indicated in the docs: http://pandas-docs.github.io/pandas-docs-travis/advanced.html#using-slicers.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design MultiIndex labels Feb 12, 2018
@jorisvandenbossche
Copy link
Member

when only scalars are passed this is ambiguous because of getitem coercion, but slices are not coerced to like scalars, so it is unambiguous.

Can you explain this in a bit more detail?

@mroeschke mroeschke added Bug and removed API Design labels Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

4 participants