Addressing multiindex raises TypeError if indices that are rightmost are not present #20951

MasterAir · 2018-05-04T12:22:25Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

A = [100, 100, 200, 200, 300, 300]
B = [10, 10, 20, 21, 31,33]
C = np.random.randint(0, 99, 6)

test_df = pd.DataFrame({'A': A, 'B': B, 'C': C})
test_df = test_df.set_index(['A', 'B'])

print(test_df)
try:
    print(test_df.loc[(100,10)])
except:
    pass

try:
    print(test_df.loc[(0,1)])
except KeyError:
    print('test_df.loc[(0,1)] raises a KeyError')

try:
    print(test_df.loc[(100,1)])
except KeyError as e:
    print(e)
    print('test_df.loc[(100,1)] raises a KeyError')
except TypeError as e:
    print(e)
    print('test+df.loc[(100,1)]) raises a TypeError')

Problem description

If the value is not present in the index, I believe that a KeyError should be raised consistently, so you can write code like.

try:
   df.loc[tuple]
except KeyError:
   # do something if the value isn't present

Expected Output

If df.loc[tuple] does not have a match in the multiindex, a KeyError should be raised.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2018-05-04T12:44:01Z

cc @toobaz this closed by your recent MI warning?

toobaz · 2018-05-04T15:07:27Z

cc @toobaz this closed by your recent MI warning?

Unfortunately not - my PR should have only affected list(-like)s of keys, not single keys. This is rather related to #19110 and #17024 (and possibly more). Basically, since 100 is found in the index (partial indexing), 1 is looked in the columns rather than in the second level of the index. Which is good (or at least, too late to break it) - except that if 1 is not found in the columns, we should retrieve the original exception.

MasterAir · 2018-05-09T09:18:38Z

I'm happy to have a go fixing this behaviour - not sure how successful I'll be or when I'll have time. Is the expected behaviour that I've specified in the issue report correct?

Should missing keys always raise a KeyError?

toobaz · 2018-05-09T10:09:10Z

I'm happy to have a go fixing this behaviour - not sure how successful I'll be or when I'll have time.

The basic idea (real code is more complicated) is to replace something like

try:
    # look for tuple in index
except:
    try:
        # look for first element in index, second element in columns
    except Exception as exc:
        raise exc

with something like:

try:
    # look for tuple in index
except Exception as exc:
    try:
        # look for first element in index, second element in columns
    except:
        raise exc

Should missing keys always raise a KeyError?

pd.Series(index=list('abc')).loc[1] raises TypeError, and (although I don't like it,) it is unrelated to the present issue. But yes, print(test_df.loc[(100,1)]) above should result in a KeyError.

toobaz · 2018-06-20T16:58:39Z

The basic idea (real code is more complicated) is to replace something like

It's actually not so simple, because we still want to raise the current error when the index is not a MultiIndex (and maybe even when it is, but the key is not a plausible MultiIndex key/indexer).

phofl · 2020-11-10T20:24:20Z

This works now, the last statement raises

Traceback (most recent call last):
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/base.py", line 3028, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 69, in <module>
    print(test_df.loc[(100,1)])
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 881, in __getitem__
    return self._getitem_tuple(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1052, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 823, in _getitem_lowerdim
    return getattr(section, self.name)[new_key]
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 881, in __getitem__
    return self._getitem_tuple(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1052, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 799, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1116, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1065, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/home/developer/PycharmProjects/pandas/pandas/core/generic.py", line 3652, in xs
    return self[key]
  File "/home/developer/PycharmProjects/pandas/pandas/core/frame.py", line 2968, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/base.py", line 3030, in get_loc
    raise KeyError(key) from err
KeyError: 1

Process finished with exit code 1

lklamt · 2021-02-15T21:02:19Z

After doing some testing, I agree, the concrete error above seems to be solved in pandas version 1.2.2. Can the Issue be closed?

gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels May 8, 2018

toobaz mentioned this issue May 22, 2018

Wrong "Too many indexers" error asking for missing key in Series MultiIndex #21168

Closed

phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Nov 10, 2020

mroeschke mentioned this issue May 28, 2021

TST: More old issues #41697

Merged

10 tasks

mroeschke added this to the 1.3 milestone May 28, 2021

jreback closed this as completed in #41697 May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing multiindex raises TypeError if indices that are rightmost are not present #20951

Addressing multiindex raises TypeError if indices that are rightmost are not present #20951

MasterAir commented May 4, 2018

INSTALLED VERSIONS

jreback commented May 4, 2018

toobaz commented May 4, 2018

MasterAir commented May 9, 2018

toobaz commented May 9, 2018

toobaz commented Jun 20, 2018

phofl commented Nov 10, 2020

lklamt commented Feb 15, 2021

Addressing multiindex raises TypeError if indices that are rightmost are not present #20951

Addressing multiindex raises TypeError if indices that are rightmost are not present #20951

Comments

MasterAir commented May 4, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented May 4, 2018

toobaz commented May 4, 2018

MasterAir commented May 9, 2018

toobaz commented May 9, 2018

toobaz commented Jun 20, 2018

phofl commented Nov 10, 2020

lklamt commented Feb 15, 2021

Output of `pd.show_versions()`