KeyError when indexing the MultiIndex of a DataFrame with ":" as first field #13597

toobaz · 2016-07-09T15:19:16Z

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame(index=pd.MultiIndex.from_tuples([['a', 'b', 'c']]), columns=[1, 2])

In [3]: df.loc['a', 'b', 'c']
Out[3]: 
1    NaN
2    NaN
Name: (a, b, c), dtype: object

In [4]: df.loc['a', 'b', :]
Out[4]: 
     1    2
c  NaN  NaN

In [5]: df.loc[:, 'b', 'c']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1397                 if key not in ax:
-> 1398                     error()
   1399             except TypeError as e:

/home/nobackup/repo/pandas/pandas/core/indexing.py in error()
   1392                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1393                                (key, self.obj._get_axis_name(axis)))
   1394 

KeyError: 'the label [b] is not in the [columns]'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-5-116a40d4994c> in <module>()
----> 1 df.loc[:, 'b', 'c']

/home/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1295 
   1296         if type(key) is tuple:
-> 1297             return self._getitem_tuple(key)
   1298         else:
   1299             return self._getitem_axis(key, axis=0)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
    790 
    791         # no multi-index, so validate all of the indexers
--> 792         self._has_valid_tuple(tup)
    793 
    794         # ugly hack for GH #836

/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_tuple(self, key)
    140             if i >= self.obj.ndim:
    141                 raise IndexingError('Too many indexers')
--> 142             if not self._has_valid_type(k, i):
    143                 raise ValueError("Location based indexing can only have [%s] "
    144                                  "types" % self._valid_types)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1404                 raise
   1405             except:
-> 1406                 error()
   1407 
   1408         return True

/home/nobackup/repo/pandas/pandas/core/indexing.py in error()
   1391                                     "key")
   1392                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1393                                (key, self.obj._get_axis_name(axis)))
   1394 
   1395             try:

KeyError: 'the label [b] is not in the [columns]'

Expected Output

Either something analogous to the previous command, or all 3 should raise an error I guess.

output of `pd.show_versions()`

In [6]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: a63bd12529ff309d957d714825b1753d0e02b7fa
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.0-2-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.18.1+174.ga63bd12
nose: 1.3.7
pip: 1.5.6
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.8.0.dev0+111ddc0
xarray: None
IPython: 5.0.0.dev
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: 1.1.0dev
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: 1.1.2
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

toobaz · 2016-07-09T15:19:44Z

(reminds #12827, not sure it's really related though)

jreback · 2016-07-09T16:45:06Z

This is ambiguous indexing from a slicer and is not possible to detect. See the warning: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers

You can do this:

On [8]: df.loc[(slice(None), 'b', 'c'), :]
Out[8]: 
         1    2
a b c  NaN  NaN

In [9]: idx = pd.IndexSlice

In [12]: df.loc[idx[:, 'b', 'c'], :]
Out[12]: 
         1    2
a b c  NaN  NaN

jorisvandenbossche · 2016-07-11T12:22:33Z

@toobaz Additional reason that this is confusing is that df.loc['a', 'b', 'c'] is actually interpreted as df.loc[('a', 'b', 'c'), :] and df.loc['a', 'b', :] as df.loc[('a', 'b'), :]. That is the reason those do work

toobaz · 2016-07-11T13:20:05Z

@jorisvandenbossche @jreback Yes, I see.

I reported the issue because it seemed to me incoherent with the use of (for instance) lists... df.loc[['a'], 'b', 'c'] works as I would expect (i.e. is interpreted as df.loc[(['a'], 'b', 'c'), :] rather than as df.loc[(['a'],), ('b', 'c')]), and I tend to think to a slicer as equivalent to a list containing all values (and I am implicitly assuming that .loc doesn't interpret an indexer based on the content of the list).

But then, I just noticed that df.loc[['a'], 'b'] instead gives the same error (differently from df.loc['a', 'b']). Which is to me even more unexpected (I would expect that every time I replace a label with a list containing it I just raise the dimensionality of the returned object), and suggests that even with lists, I was oversimplifying.

Maybe the coherent behaviour I have in mind implies unwanted side-effects which I am missing... still the current behaviour is quite unexpected in several cases. But true, the documentation suggests one should not even try.

jreback added Usage Question MultiIndex labels Jul 9, 2016

jreback added this to the No action milestone Jul 9, 2016

jreback closed this as completed Jul 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError when indexing the MultiIndex of a DataFrame with ":" as first field #13597

KeyError when indexing the MultiIndex of a DataFrame with ":" as first field #13597

toobaz commented Jul 9, 2016

toobaz commented Jul 9, 2016

jreback commented Jul 9, 2016

jorisvandenbossche commented Jul 11, 2016

toobaz commented Jul 11, 2016

KeyError when indexing the MultiIndex of a DataFrame with ":" as first field #13597

KeyError when indexing the MultiIndex of a DataFrame with ":" as first field #13597

Comments

toobaz commented Jul 9, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

toobaz commented Jul 9, 2016

jreback commented Jul 9, 2016

jorisvandenbossche commented Jul 11, 2016

toobaz commented Jul 11, 2016

output of `pd.show_versions()`