Skip to content

KeyError when indexing the MultiIndex of a DataFrame with ":" as first field #13597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Jul 9, 2016 · 4 comments
Closed

Comments

@toobaz
Copy link
Member

toobaz commented Jul 9, 2016

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame(index=pd.MultiIndex.from_tuples([['a', 'b', 'c']]), columns=[1, 2])

In [3]: df.loc['a', 'b', 'c']
Out[3]: 
1    NaN
2    NaN
Name: (a, b, c), dtype: object

In [4]: df.loc['a', 'b', :]
Out[4]: 
     1    2
c  NaN  NaN

In [5]: df.loc[:, 'b', 'c']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1397                 if key not in ax:
-> 1398                     error()
   1399             except TypeError as e:

/home/nobackup/repo/pandas/pandas/core/indexing.py in error()
   1392                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1393                                (key, self.obj._get_axis_name(axis)))
   1394 

KeyError: 'the label [b] is not in the [columns]'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-5-116a40d4994c> in <module>()
----> 1 df.loc[:, 'b', 'c']

/home/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1295 
   1296         if type(key) is tuple:
-> 1297             return self._getitem_tuple(key)
   1298         else:
   1299             return self._getitem_axis(key, axis=0)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
    790 
    791         # no multi-index, so validate all of the indexers
--> 792         self._has_valid_tuple(tup)
    793 
    794         # ugly hack for GH #836

/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_tuple(self, key)
    140             if i >= self.obj.ndim:
    141                 raise IndexingError('Too many indexers')
--> 142             if not self._has_valid_type(k, i):
    143                 raise ValueError("Location based indexing can only have [%s] "
    144                                  "types" % self._valid_types)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1404                 raise
   1405             except:
-> 1406                 error()
   1407 
   1408         return True

/home/nobackup/repo/pandas/pandas/core/indexing.py in error()
   1391                                     "key")
   1392                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1393                                (key, self.obj._get_axis_name(axis)))
   1394 
   1395             try:

KeyError: 'the label [b] is not in the [columns]'

Expected Output

Either something analogous to the previous command, or all 3 should raise an error I guess.

output of pd.show_versions()

In [6]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: a63bd12529ff309d957d714825b1753d0e02b7fa
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.0-2-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.18.1+174.ga63bd12
nose: 1.3.7
pip: 1.5.6
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.8.0.dev0+111ddc0
xarray: None
IPython: 5.0.0.dev
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: 1.1.0dev
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: 1.1.2
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: 0.2.1


@toobaz
Copy link
Member Author

toobaz commented Jul 9, 2016

(reminds #12827, not sure it's really related though)

@jreback jreback added this to the No action milestone Jul 9, 2016
@jreback
Copy link
Contributor

jreback commented Jul 9, 2016

This is ambiguous indexing from a slicer and is not possible to detect. See the warning: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers

You can do this:

On [8]: df.loc[(slice(None), 'b', 'c'), :]
Out[8]: 
         1    2
a b c  NaN  NaN

In [9]: idx = pd.IndexSlice

In [12]: df.loc[idx[:, 'b', 'c'], :]
Out[12]: 
         1    2
a b c  NaN  NaN

@jreback jreback closed this as completed Jul 9, 2016
@jorisvandenbossche
Copy link
Member

@toobaz Additional reason that this is confusing is that df.loc['a', 'b', 'c'] is actually interpreted as df.loc[('a', 'b', 'c'), :] and df.loc['a', 'b', :] as df.loc[('a', 'b'), :]. That is the reason those do work

@toobaz
Copy link
Member Author

toobaz commented Jul 11, 2016

@jorisvandenbossche @jreback Yes, I see.

I reported the issue because it seemed to me incoherent with the use of (for instance) lists... df.loc[['a'], 'b', 'c'] works as I would expect (i.e. is interpreted as df.loc[(['a'], 'b', 'c'), :] rather than as df.loc[(['a'],), ('b', 'c')]), and I tend to think to a slicer as equivalent to a list containing all values (and I am implicitly assuming that .loc doesn't interpret an indexer based on the content of the list).

But then, I just noticed that df.loc[['a'], 'b'] instead gives the same error (differently from df.loc['a', 'b']). Which is to me even more unexpected (I would expect that every time I replace a label with a list containing it I just raise the dimensionality of the returned object), and suggests that even with lists, I was oversimplifying.

Maybe the coherent behaviour I have in mind implies unwanted side-effects which I am missing... still the current behaviour is quite unexpected in several cases. But true, the documentation suggests one should not even try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants