Wrong "Too many indexers" error message when indexing a Series with MultiIndex #14885

toobaz · 2016-12-15T10:02:04Z

Code Sample, a copy-pastable example if possible

In [2]: s = pd.Series(range(4), index=pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']]))

In [3]: s.loc['a', 'e']
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
<ipython-input-3-042ed7ab0463> in <module>()
----> 1 s.loc['a', 'e']

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1308 
   1309         if type(key) is tuple:
-> 1310             return self._getitem_tuple(key)
   1311         else:
   1312             return self._getitem_axis(key, axis=0)

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
    799 
    800         # no multi-index, so validate all of the indexers
--> 801         self._has_valid_tuple(tup)
    802 
    803         # ugly hack for GH #836

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_tuple(self, key)
    148         for i, k in enumerate(key):
    149             if i >= self.obj.ndim:
--> 150                 raise IndexingError('Too many indexers')
    151             if not self._has_valid_type(k, i):
    152                 raise ValueError("Location based indexing can only have [%s] "

IndexingError: Too many indexers

Problem description

The raised error, and the message, seem wrong to me.

Expected Output

KeyError: ('a', 'e')

This is what I get if I I take a DataFrame with the same index and do

In [7]: df.loc[('a', 'e'), :]

By the way, s.loc['a', 'b'] (valid key) works just fine, so this is clearly a problem of missing key, and the docs say ".loc will raise a KeyError when the items are not found."

... and by the way, I would expect the following to raise an IndexingError: Too many indexers:

In [12]: s.loc[('a', 'e'), :]
Out[12]: 
a  c    0
   d    1
dtype: int64

... instead the tuple is interpreted as a list of labels rather than as a key, and hence it works "fine". Is this behavior desired? (looks a bit inconsistent to me, but I see that it is generalized, i.e. DataFrames also work this way) If it is, then it is worth fixing the docs where they mention "A list or array of labels, e.g. ['a', 'b', 'c']." to also mention tuples.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.7.0-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.19.0+196.g5f889a2.dirty
nose: 1.3.7
pip: 8.1.2
setuptools: 28.0.0
Cython: 0.23.4
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.8.0.dev0+f80669e
xarray: None
IPython: 5.1.0.dev
sphinx: 1.4.8
patsy: 0.3.0-dev
dateutil: 2.5.3
pytz: 2015.7
blosc: None
bottleneck: 1.2.0dev
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.2
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

jreback · 2016-12-15T23:46:55Z

so ['a', 'b'] looks exactly like [('a, 'b')] when passed to __getitem__ so using ['a', 'b'] actually is a convenience way of indexing a multi-index on a Series. It does work generally, but care must be taken.

So i'll mark this as a bug / error reporting for the 2 cases you have identified. PR's welcome.

bluenote10 · 2017-02-21T15:18:53Z

For the record, as a work-around to get the expected KeyError you can use s.loc(axis=0)['a', 'e'].

jorisvandenbossche · 2017-02-21T22:08:25Z

so ['a', 'b'] looks exactly like [('a, 'b')] when passed to __getitem__

It doesn't only look the same, it is actually exactly the same for getitem.
As a consequence, you also get the same (wrong) error message:

In [4]: s.loc[('a', 'b')]
...
IndexingError: Too many indexers

Some similar other cases where the multi-indexing gets it wrong (using the same example series):

In [33]: s.loc['a', 'e', :]
...
UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (2)'

In this case I would have expected the 'too many indexers' error.

Also when slicing the first levels, you get this error message:

In [21]: s.loc[:, 'e']
...
IndexingError: Too many indexers

which can be very misleading in a real-world use case with a large dataframe and points you in the wrong direction for trying to fix your code (encountered this last week).

jorisvandenbossche · 2017-02-21T22:16:38Z

Regarding the other example you give:

... and by the way, I would expect the following to raise an IndexingError: Too many indexers:

In [12]: s.loc[('a', 'e'), :]
Out[12]: 
a  c    0
   d    1
dtype: int64

... instead the tuple is interpreted as a list of labels rather than as a key, and hence it works "fine". Is this behavior desired? (looks a bit inconsistent to me, but I see that it is generalized, i.e. DataFrames also work this way) If it is, then it is worth fixing the docs where they mention "A list or array of labels, e.g. ['a', 'b', 'c']." to also mention tuples.

I don't think this should ever raise a "IndexingError: Too many indexers" as you actually have a length 2 indexer (so suits the multi-index). But you are correct the interpretation of the tuple here as a list may be a bit inconsistent.
In other cases (eg indexing the columns of a dataframe), we certainly do not interpret a tuple as a list.

jreback added Error Reporting Incorrect or improved errors from pandas MultiIndex labels Dec 15, 2016

jreback added Difficulty Intermediate labels Dec 15, 2016

jreback added this to the Next Major Release milestone Dec 15, 2016

toobaz mentioned this issue Apr 21, 2017

.loc with list of incomplete labels misbehaves or raises #16083

Open

ludaavics mentioned this issue Dec 28, 2018

Wrong "Too many indexers" when value in series with MultiIndex is None #24474

Closed

gfyoung added the good first issue label Dec 30, 2018

ryanreh99 mentioned this issue Apr 20, 2019

BUG: Raise KeyError when indexing a Series with MultiIndex #26155

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 0.25.0 Apr 21, 2019

jreback closed this as completed in #26155 Apr 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong "Too many indexers" error message when indexing a Series with MultiIndex #14885

Wrong "Too many indexers" error message when indexing a Series with MultiIndex #14885

toobaz commented Dec 15, 2016

INSTALLED VERSIONS

jreback commented Dec 15, 2016

bluenote10 commented Feb 21, 2017

jorisvandenbossche commented Feb 21, 2017

jorisvandenbossche commented Feb 21, 2017

Wrong "Too many indexers" error message when indexing a Series with MultiIndex #14885

Wrong "Too many indexers" error message when indexing a Series with MultiIndex #14885

Comments

toobaz commented Dec 15, 2016

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Dec 15, 2016

bluenote10 commented Feb 21, 2017

jorisvandenbossche commented Feb 21, 2017

jorisvandenbossche commented Feb 21, 2017

Output of `pd.show_versions()`