Skip to content

BUG: multi level index may throws KeyError with a number instead of specified keys #44252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
unarist opened this issue Oct 31, 2021 · 1 comment
Closed
3 tasks done
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@unarist
Copy link

unarist commented Oct 31, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

>>> import pandas as pd
>>> df = pd.DataFrame({('A', 'a') : [1], ('A', 'b') : [2], ('B', 'c') : [3], ('B', 'd') : [4]})
>>> df
   A     B
   a  b  c  d
0  1  2  3  4
>>> df[('A', 'c')]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 3457, in __getitem__
    return self._getitem_multilevel(key)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 3508, in _getitem_multilevel
    loc = self.columns.get_loc(key)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/multi.py", line 2932, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 725, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1832, in pandas._libs.hashtable.UInt64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1841, in pandas._libs.hashtable.UInt64HashTable.get_item
KeyError: 11

Issue Description

The example throws KeyError with mysterious number 11 instead of specified keys ('A', 'c').

It seems that multi-level index throws:

  • KeyError <keys> if some of keys are invalid in their level (OK)
  • KeyError <number> if keys are valid for their level, but the combination is wrong (??)

I think this behavior is because of the lack of catch for self._base.get_loc:

def get_loc(self, object key):
if is_definitely_invalid_key(key):
raise TypeError(f"'{key}' is an invalid key")
if not isinstance(key, tuple):
raise KeyError(key)
try:
indices = [0 if checknull(v) else lev.get_loc(v) + 1
for lev, v in zip(self.levels, key)]
except KeyError:
raise KeyError(key)
# Transform indices into single integer:
lab_int = self._codes_to_ints(np.array(indices, dtype='uint64'))
return self._base.get_loc(self, lab_int)

Expected Behavior

It should throw KeyError: ('A', 'c') even if all keys are valid for their levels.

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.8.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.47-linuxkit
Version : #1 SMP Sat Jul 3 21:51:47 UTC 2021
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.4
numpy : 1.21.3
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 57.5.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@unarist unarist added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 31, 2021
@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 5, 2021
@lukemanley
Copy link
Member

This now raises KeyError: ('A', 'c') on main and looks like it was fixed and tested in #52194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

3 participants