Skip to content

KeyErrors are inconsistent and too verbose #25996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
naught101 opened this issue Apr 5, 2019 · 4 comments
Open

KeyErrors are inconsistent and too verbose #25996

naught101 opened this issue Apr 5, 2019 · 4 comments
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@naught101
Copy link

naught101 commented Apr 5, 2019

Code Sample, a copy-pastable example if possible

In [1]: df['asda'] 
KeyError: 'asda'

in [2]: df[['asda']]   
KeyError: "None of [Index(['asda'], dtype='object')] are in the [columns]"

In [3]: df[['asda', 'lat']]  # lat is a valid column                                                              
KeyError: "['asda'] not in index"

In [4]: df.loc['asda']                                                                                            
KeyError: 'asda'

In [5]: df.loc['asda']  
KeyError: "None of [Index(['asda'], dtype='object')] are in the [index]"

In [6]: df.loc[['asda', 0]]                                                                                       
/home/nedcr/miniconda3/envs/ana/bin/ipython:1: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  #!/home/nedcr/miniconda3/envs/ana/bin/python
Out[9]: 
          id      lat       lng  location BL  ...  location 2030  scale 2030  location 2070  scale 2070
asda     NaN      NaN       NaN          NaN  ...            NaN         NaN            NaN         NaN
0     1009.0 -15.4875  124.5222    38.224253  ...      38.860162    0.947698      40.494363    0.987552

[2 rows x 10 columns]

Problem description

The last example is kind of irrelevant, as it will be changed soon.

The 2nd and 5th example are annoying, because it is often useful to catch KeyErrors, and use their values to return nice error messages. That's difficult with this output.

Also they are slightly misleading, as I didn't pass and index as the indexer, so it is potentially annoying for debugging.

Expected Output

example 2 should have a message either like

KeyError: ['asda'] not in columns

or just

KeyError: ['asda']

Similarly for example 5.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-16-lowlatency
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: 0.12.0
IPython: 7.3.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.2
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jorisvandenbossche
Copy link
Member

@naught101 thanks for the report! I agree that it would be nice to make those messages a bit more consistent and easier to read.

Can you make the code snippet reproducible? (eg you can just add a first line creating a dummy DataFrame df, you can also use easier column names like 'a', 'b', ..)
Further, it seems that case 4 and 5 have the same code but a different output? Is there a typo somewhere, or does that really happen?

@jorisvandenbossche jorisvandenbossche added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves labels Apr 5, 2019
@naught101
Copy link
Author

Here is a better example:

df = pd.DataFrame(dict(a=[1,2], b=[3,4])) 

df['c']
# KeyError: 'c'

df[['c']]
# KeyError: "None of [Index(['c'], dtype='object')] are in the [columns]"

df[['b', 'c']] 
# KeyError: "['c'] not in index"

df.loc[2]
# KeyError: 2

df.loc[[2]] 
# KeyError: "None of [Int64Index([2], dtype='int64')] are in the [index]"

df.loc[[1,2]]
# /home/nedcr/miniconda3/envs/ana/bin/ipython:1: FutureWarning: 
# Passing list-likes to .loc or [] with any missing label will raise
# KeyError in the future, you can use .reindex() as an alternative.
# 
# See the documentation here:
# https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
#   #!/home/nedcr/miniconda3/envs/ana/bin/python
# Out[13]: 
#      a    b
# 1  2.0  4.0
# 2  NaN  NaN

@simonjayhawkins
Copy link
Member

The 2nd and 5th example are annoying, because it is often useful to catch KeyErrors, and use their values to return nice error messages. That's difficult with this output.

are these not nice messages already? 😃

they probably make more sense when more than one value is passed..

df[['c','d']]
# KeyError: "None of [Index(['c', 'd'], dtype='object')] are in the [columns]"

df.loc[[2, 3]]
# KeyError: "None of [Int64Index([2, 3], dtype='int64')] are in the [index]"

So the messages are consistent with this scenario, which probably makes handling the errors and raising custom messages easier.

I'm not disagreeing that the messages could be improved, just not sure what would be better, while maintaining (some sort of) consistency. To improve the messages may require introducing more variations and hence less consistency.

I think the 3rd message is perhaps more or an issue, since 'index' should probably be 'columns', and overall more consistent with the 2nd and 5th messages.

Also they are slightly misleading, as I didn't pass and index as the indexer, so it is potentially annoying for debugging.

agreed.

@simonjayhawkins
Copy link
Member

A simple solution would be to not include any brackets or messages in KeyError and use IndexingError instead if this was necessary to be more explicit about the error.

KeyError would only be used for an issue with a single key.

That would make it feasible for a list indexer to raise a more friendly and less verbose message for problems with a single key and allow better exception handling...

df['c']
# KeyError: 'c'

df[['c']]
# KeyError: 'c'

df[['b', 'c']] 
# KeyError: 'c'

df.loc[2]
# KeyError: 2

df.loc[[2]] 
# KeyError: 2

df[['c','d']]
# IndexingError: ['c', 'd'] not in the columns Index.

df[['b', 'c', 'd']] 
# IndexingError: ['c', 'd'] not in the columns Index.

df.loc[[2, 3]]
# IndexingError: [2, 3] not in the Index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

4 participants