Trying to access non-existent df column throws KeyError for wrong key #28799

jwhendy · 2019-10-05T17:55:00Z

Code Sample, a copy-pastable example if possible

I ran into this on a more complicated groupby().apply(lambda: ...) call, but reproduced a simple version below of what I think seems like a bug (or at least undesirable behavior).

import pandas as pd

# simple dataframe
test = pd.DataFrame({'var': ['a', 'a', 'b', 'b'], 'val': range(4)})

# simulated mistake of asking for a 'vau' column, not 'val'
test.groupby('var').apply(lambda rows: pd.DataFrame({'var': [rows.iloc[-1]['var']],
                                                     'val': [rows.iloc[-1]['vau']]}))

Doing this yields:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4735             try:
-> 4736                 return libindex.get_value_box(s, key)
   4737             except IndexError:

pandas/_libs/index.pyx in pandas._libs.index.get_value_box()

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.validate_indexer()

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

----------  8<  [snip]  >8 ----------

KeyError: 'var'

Problem description

When attempting to troubleshoot this, KeyError: 'foo' is misleading, as it doesn't point to the key that failed, but one that succeeded. In my actual code, I was doing this for more variables, so while this case is trivial to troubleshoot, real world cases might be more confusing and challenging to figure out for the user.

Expected Output

A KeyError: 'foo' where foo is the incorrect key, not a successful one.

Output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.1-arch1-1-ARCH
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.1
pytz : 2019.2
dateutil : 2.8.0
pip : 19.0.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 2.6.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

mroeschke · 2019-10-06T00:05:40Z

This looks fixed on master:

~/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1612             return self.table.vals[k]
   1613         else:
-> 1614             raise KeyError(val)
   1615
   1616     cpdef set_item(self, object key, Py_ssize_t val):

KeyError: 'vau'

In [2]: pd.__version__
Out[2]: '0.26.0.dev0+490.g9cfb8b55b'

I guess this could use a regression test.

bravech · 2019-10-06T03:27:52Z

Hi! I'd like to work on this as my first issue. You just need a regression test added to check that the proper KeyError is thrown?

mroeschke · 2019-10-06T04:27:56Z

Correct.

…28809)

…#28799) (pandas-dev#28809)

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Oct 6, 2019

bravech mentioned this issue Oct 6, 2019

TST: Dataframe KeyError when getting nonexistent category (#28799) #28809

Merged

5 tasks

jreback added this to the 1.0 milestone Oct 6, 2019

jreback closed this as completed in #28809 Oct 8, 2019

jreback pushed a commit that referenced this issue Oct 8, 2019

TST: Dataframe KeyError when getting nonexistent category (#28799) (#…

39602e7

…28809)

proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019

TST: Dataframe KeyError when getting nonexistent category (pandas-dev…

19a7f9d

…#28799) (pandas-dev#28809)

proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019

TST: Dataframe KeyError when getting nonexistent category (pandas-dev…

9992e69

…#28799) (pandas-dev#28809)

bongolegend pushed a commit to bongolegend/pandas that referenced this issue Jan 1, 2020

TST: Dataframe KeyError when getting nonexistent category (pandas-dev…

270c9a1

…#28799) (pandas-dev#28809)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to access non-existent df column throws KeyError for wrong key #28799

Trying to access non-existent df column throws KeyError for wrong key #28799

jwhendy commented Oct 5, 2019 •

edited

Loading

INSTALLED VERSIONS

mroeschke commented Oct 6, 2019

bravech commented Oct 6, 2019

mroeschke commented Oct 6, 2019

Trying to access non-existent df column throws KeyError for wrong key #28799

Trying to access non-existent df column throws KeyError for wrong key #28799

Comments

jwhendy commented Oct 5, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

mroeschke commented Oct 6, 2019

bravech commented Oct 6, 2019

mroeschke commented Oct 6, 2019

jwhendy commented Oct 5, 2019 •

edited

Loading

Output of `pd.show_versions()`