.loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame #12660

adamdivak · 2016-03-17T14:17:07Z

Hello,

I know it is well documented that MultiIndex DataFrames need to be sorted to use slicing, and that is fine. Even if you forget this, in most cases (for example when using .loc with a slicer) Pandas gives a helpful error message when you try to call it on an unsorted DataFrame, which makes it easy to spot the mistake and add the necessary sorting. However when simply using .loc without a slicer, the same KeyError exception is raised without an error message, which looks like as if it was a legit key error.

Code Sample, a copy-pastable example if possible

Create a test DataFrame

iterables = [['a', 'b'], [2, 1]]
columns = pd.MultiIndex.from_product(iterables, names=['col1', 'col2'])
rows = pd.MultiIndex.from_product(iterables, names=['row1', 'row2'])
df = pd.DataFrame(np.random.randn(4, 4), index=rows, columns=columns)
print(df)

col1              a                   b          
col2              2         1         2         1
row1 row2                                        
a    2    -1.285010  0.183851 -1.180964  0.885343
     1     0.213501  0.479927  0.142614  0.064209
b    2     0.250557 -0.612791 -0.275680 -0.134086
     1    -0.853687 -2.397638  0.940984  1.133747

Try to call .loc without a slicer

df.loc['a', 'b']

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

<ipython-input-28-b77cac191687> in <module>()
----> 1 df.loc['a', 'b']


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1223     def __getitem__(self, key):
   1224         if type(key) is tuple:
-> 1225             return self._getitem_tuple(key)
   1226         else:
   1227             return self._getitem_axis(key, axis=0)


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
    736     def _getitem_tuple(self, tup):
    737         try:
--> 738             return self._getitem_lowerdim(tup)
    739         except IndexingError:
    740             pass


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    849         ax0 = self.obj._get_axis(0)
    850         if isinstance(ax0, MultiIndex):
--> 851             result = self._handle_lowerdim_multi_index_axis0(tup)
    852             if result is not None:
    853                 return result


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _handle_lowerdim_multi_index_axis0(self, tup)
    831             ax0 = self.obj._get_axis(0)
    832             if not ax0.is_lexsorted_for_tuple(tup):
--> 833                 raise e1
    834 
    835         return None


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _handle_lowerdim_multi_index_axis0(self, tup)
    820         try:
    821             # fast path for series or for tup devoid of slices
--> 822             return self._get_label(tup, axis=0)
    823         except TypeError:
    824             # slices are unhashable


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
     84             raise IndexingError('no slices here, handle elsewhere')
     85 
---> 86         return self.obj._xs(label, axis=axis)
     87 
     88     def _get_loc(self, key, axis=0):


/opt/conda/lib/python3.4/site-packages/pandas/core/generic.py in xs(self, key, axis, level, copy, drop_level)
   1482         if isinstance(index, MultiIndex):
   1483             loc, new_index = self.index.get_loc_level(key,
-> 1484                                                       drop_level=drop_level)
   1485         else:
   1486             loc = self.index.get_loc(key)


/opt/conda/lib/python3.4/site-packages/pandas/core/index.py in get_loc_level(self, key, level, drop_level)
   5553                             key = tuple(self[indexer].tolist()[0])
   5554 
-> 5555                         return (self._engine.get_loc(_values_from_object(key)),
   5556                                 None)
   5557                     else:


pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)()


pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)()


pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)()


pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)()


KeyError: ('a', 'b')

Make the same call after setting the sortlevel

df2 = df.sortlevel(0)
print(df2.loc['a', 'b'])

col2         2         1
row2                    
1     0.142614  0.064209
2    -1.180964  0.885343

Expected Output

The same helpful error message, regardless of using or not using an explicit slicer in the .loc query.

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'

output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-27-generic
machine: x86_64
processor: 
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.17.1
nose: None
pip: 8.0.2
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
IPython: 4.1.1
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-03-17T14:22:51Z

hmm, yeah ideally this could be done with a helpful message. pull-requests are welcome! its actually not that deep in the code. just keep stepping thru.

jreback · 2016-03-17T14:24:25Z

prob can start this with #11897 to make a useful exception (it still inherits from KeyError)

adamdivak · 2016-03-17T14:50:46Z

#11897 would certainly solve this. If @Dr-Irv can make the change, all the better, if not then I'll try to find the time to work on it, though I can't promise an exact time.
Thanks for all your work as always!

jreback · 2016-03-17T14:52:10Z

don't wait for the other change as its orthogonal
thanks!

adamdivak · 2016-03-21T13:59:21Z

Ok, I'm on it, will send a PR soon

toobaz · 2017-11-27T14:52:21Z

This seems to be fixed (the example works fine).

jreback · 2017-11-27T15:00:52Z

can u put up this as a validation test

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas MultiIndex Difficulty Intermediate labels Mar 17, 2016

jreback added this to the 0.18.1 milestone Mar 17, 2016

adamdivak changed the title ~~.loc sometimes raising KeyError without an error message when called on an unsorted MultiIndex DataFrame~~ .loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame Mar 17, 2016

adamdivak mentioned this issue Apr 3, 2016

BUG: loc raises inconsistent error on unsorted MultiIndex #12790

Closed

4 tasks

jreback modified the milestones: 0.18.1, 0.18.2 Apr 26, 2016

jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 21, 2016

simonjayhawkins mentioned this issue Jul 12, 2019

TST: add test for multiindex partial indexing both axis #27359

Merged

5 tasks

simonjayhawkins added Needs Tests Unit test(s) needed to prevent regressions and removed Difficulty Intermediate Error Reporting Incorrect or improved errors from pandas labels Jul 12, 2019

jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 12, 2019

jreback closed this as completed in #27359 Jul 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame #12660

.loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame #12660

adamdivak commented Mar 17, 2016

jreback commented Mar 17, 2016

jreback commented Mar 17, 2016

adamdivak commented Mar 17, 2016

jreback commented Mar 17, 2016

adamdivak commented Mar 21, 2016

toobaz commented Nov 27, 2017

jreback commented Nov 27, 2017

.loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame #12660

.loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame #12660

Comments

adamdivak commented Mar 17, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

jreback commented Mar 17, 2016

jreback commented Mar 17, 2016

adamdivak commented Mar 17, 2016

jreback commented Mar 17, 2016

adamdivak commented Mar 21, 2016

toobaz commented Nov 27, 2017

jreback commented Nov 27, 2017

output of `pd.show_versions()`