Skip to content

.loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame #12660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adamdivak opened this issue Mar 17, 2016 · 7 comments · Fixed by #27359
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@adamdivak
Copy link

Hello,

I know it is well documented that MultiIndex DataFrames need to be sorted to use slicing, and that is fine. Even if you forget this, in most cases (for example when using .loc with a slicer) Pandas gives a helpful error message when you try to call it on an unsorted DataFrame, which makes it easy to spot the mistake and add the necessary sorting. However when simply using .loc without a slicer, the same KeyError exception is raised without an error message, which looks like as if it was a legit key error.

Code Sample, a copy-pastable example if possible

Create a test DataFrame

iterables = [['a', 'b'], [2, 1]]
columns = pd.MultiIndex.from_product(iterables, names=['col1', 'col2'])
rows = pd.MultiIndex.from_product(iterables, names=['row1', 'row2'])
df = pd.DataFrame(np.random.randn(4, 4), index=rows, columns=columns)
print(df)
col1              a                   b          
col2              2         1         2         1
row1 row2                                        
a    2    -1.285010  0.183851 -1.180964  0.885343
     1     0.213501  0.479927  0.142614  0.064209
b    2     0.250557 -0.612791 -0.275680 -0.134086
     1    -0.853687 -2.397638  0.940984  1.133747

Try to call .loc without a slicer

df.loc['a', 'b']
---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

<ipython-input-28-b77cac191687> in <module>()
----> 1 df.loc['a', 'b']


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1223     def __getitem__(self, key):
   1224         if type(key) is tuple:
-> 1225             return self._getitem_tuple(key)
   1226         else:
   1227             return self._getitem_axis(key, axis=0)


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
    736     def _getitem_tuple(self, tup):
    737         try:
--> 738             return self._getitem_lowerdim(tup)
    739         except IndexingError:
    740             pass


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    849         ax0 = self.obj._get_axis(0)
    850         if isinstance(ax0, MultiIndex):
--> 851             result = self._handle_lowerdim_multi_index_axis0(tup)
    852             if result is not None:
    853                 return result


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _handle_lowerdim_multi_index_axis0(self, tup)
    831             ax0 = self.obj._get_axis(0)
    832             if not ax0.is_lexsorted_for_tuple(tup):
--> 833                 raise e1
    834 
    835         return None


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _handle_lowerdim_multi_index_axis0(self, tup)
    820         try:
    821             # fast path for series or for tup devoid of slices
--> 822             return self._get_label(tup, axis=0)
    823         except TypeError:
    824             # slices are unhashable


/opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
     84             raise IndexingError('no slices here, handle elsewhere')
     85 
---> 86         return self.obj._xs(label, axis=axis)
     87 
     88     def _get_loc(self, key, axis=0):


/opt/conda/lib/python3.4/site-packages/pandas/core/generic.py in xs(self, key, axis, level, copy, drop_level)
   1482         if isinstance(index, MultiIndex):
   1483             loc, new_index = self.index.get_loc_level(key,
-> 1484                                                       drop_level=drop_level)
   1485         else:
   1486             loc = self.index.get_loc(key)


/opt/conda/lib/python3.4/site-packages/pandas/core/index.py in get_loc_level(self, key, level, drop_level)
   5553                             key = tuple(self[indexer].tolist()[0])
   5554 
-> 5555                         return (self._engine.get_loc(_values_from_object(key)),
   5556                                 None)
   5557                     else:


pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)()


pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)()


pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)()


pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)()


KeyError: ('a', 'b')

Make the same call after setting the sortlevel

df2 = df.sortlevel(0)
print(df2.loc['a', 'b'])
col2         2         1
row2                    
1     0.142614  0.064209
2    -1.180964  0.885343

Expected Output

The same helpful error message, regardless of using or not using an explicit slicer in the .loc query.

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'

output of pd.show_versions()

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-27-generic
machine: x86_64
processor: 
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.17.1
nose: None
pip: 8.0.2
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
IPython: 4.1.1
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None
@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

hmm, yeah ideally this could be done with a helpful message. pull-requests are welcome! its actually not that deep in the code. just keep stepping thru.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas MultiIndex Difficulty Intermediate labels Mar 17, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 17, 2016
@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

prob can start this with #11897 to make a useful exception (it still inherits from KeyError)

@adamdivak adamdivak changed the title .loc sometimes raising KeyError without an error message when called on an unsorted MultiIndex DataFrame .loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame Mar 17, 2016
@adamdivak
Copy link
Author

#11897 would certainly solve this. If @Dr-Irv can make the change, all the better, if not then I'll try to find the time to work on it, though I can't promise an exact time.
Thanks for all your work as always!

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

don't wait for the other change as its orthogonal
thanks!

@adamdivak
Copy link
Author

Ok, I'm on it, will send a PR soon

@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 21, 2016
@toobaz
Copy link
Member

toobaz commented Nov 27, 2017

This seems to be fixed (the example works fine).

@jreback
Copy link
Contributor

jreback commented Nov 27, 2017

can u put up this as a validation test

@simonjayhawkins simonjayhawkins added Needs Tests Unit test(s) needed to prevent regressions and removed Difficulty Intermediate Error Reporting Incorrect or improved errors from pandas labels Jul 12, 2019
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
5 participants