Skip to content

Addressing multiindex raises TypeError if indices that are rightmost are not present #20951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MasterAir opened this issue May 4, 2018 · 7 comments · Fixed by #41697
Closed
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@MasterAir
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

A = [100, 100, 200, 200, 300, 300]
B = [10, 10, 20, 21, 31,33]
C = np.random.randint(0, 99, 6)

test_df = pd.DataFrame({'A': A, 'B': B, 'C': C})
test_df = test_df.set_index(['A', 'B'])

print(test_df)
try:
    print(test_df.loc[(100,10)])
except:
    pass

try:
    print(test_df.loc[(0,1)])
except KeyError:
    print('test_df.loc[(0,1)] raises a KeyError')

try:
    print(test_df.loc[(100,1)])
except KeyError as e:
    print(e)
    print('test_df.loc[(100,1)] raises a KeyError')
except TypeError as e:
    print(e)
    print('test+df.loc[(100,1)]) raises a TypeError')

Problem description

If the value is not present in the index, I believe that a KeyError should be raised consistently, so you can write code like.

try:
   df.loc[tuple]
except KeyError:
   # do something if the value isn't present

Expected Output

If df.loc[tuple] does not have a match in the multiindex, a KeyError should be raised.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 4, 2018

cc @toobaz this closed by your recent MI warning?

@toobaz
Copy link
Member

toobaz commented May 4, 2018

cc @toobaz this closed by your recent MI warning?

Unfortunately not - my PR should have only affected list(-like)s of keys, not single keys. This is rather related to #19110 and #17024 (and possibly more). Basically, since 100 is found in the index (partial indexing), 1 is looked in the columns rather than in the second level of the index. Which is good (or at least, too late to break it) - except that if 1 is not found in the columns, we should retrieve the original exception.

@gfyoung gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels May 8, 2018
@MasterAir
Copy link
Author

I'm happy to have a go fixing this behaviour - not sure how successful I'll be or when I'll have time. Is the expected behaviour that I've specified in the issue report correct?

Should missing keys always raise a KeyError?

@toobaz
Copy link
Member

toobaz commented May 9, 2018

I'm happy to have a go fixing this behaviour - not sure how successful I'll be or when I'll have time.

The basic idea (real code is more complicated) is to replace something like

try:
    # look for tuple in index
except:
    try:
        # look for first element in index, second element in columns
    except Exception as exc:
        raise exc

with something like:

try:
    # look for tuple in index
except Exception as exc:
    try:
        # look for first element in index, second element in columns
    except:
        raise exc

Should missing keys always raise a KeyError?

pd.Series(index=list('abc')).loc[1] raises TypeError, and (although I don't like it,) it is unrelated to the present issue. But yes, print(test_df.loc[(100,1)]) above should result in a KeyError.

@toobaz
Copy link
Member

toobaz commented Jun 20, 2018

The basic idea (real code is more complicated) is to replace something like

It's actually not so simple, because we still want to raise the current error when the index is not a MultiIndex (and maybe even when it is, but the key is not a plausible MultiIndex key/indexer).

@phofl
Copy link
Member

phofl commented Nov 10, 2020

This works now, the last statement raises

Traceback (most recent call last):
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/base.py", line 3028, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 69, in <module>
    print(test_df.loc[(100,1)])
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 881, in __getitem__
    return self._getitem_tuple(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1052, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 823, in _getitem_lowerdim
    return getattr(section, self.name)[new_key]
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 881, in __getitem__
    return self._getitem_tuple(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1052, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 799, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1116, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1065, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/home/developer/PycharmProjects/pandas/pandas/core/generic.py", line 3652, in xs
    return self[key]
  File "/home/developer/PycharmProjects/pandas/pandas/core/frame.py", line 2968, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/base.py", line 3030, in get_loc
    raise KeyError(key) from err
KeyError: 1

Process finished with exit code 1

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Nov 10, 2020
@lklamt
Copy link

lklamt commented Feb 15, 2021

After doing some testing, I agree, the concrete error above seems to be solved in pandas version 1.2.2. Can the Issue be closed?

@mroeschke mroeschke mentioned this issue May 28, 2021
10 tasks
@mroeschke mroeschke added this to the 1.3 milestone May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants