Skip to content

errors='ignore' does not work on df.replace({col : type}) if col not found #30324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
climatebrad opened this issue Dec 18, 2019 · 8 comments
Open
Labels
Bug Error Reporting Incorrect or improved errors from pandas replace replace method

Comments

@climatebrad
Copy link

climatebrad commented Dec 18, 2019

This may be considered a feature request

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'a' : [1, 2, 3]})
df.astype({'b' : str}, errors='ignore')

KeyError: 'Only a column name can be used for the key in a dtype mappings argument.'

Problem description

A KeyError is still raised if the column name is not found. I'd think errors='ignore' should suppress this error, which would make it consistent with other methods, such as df.rename(errors='ignore').

Expected Output

Unchanged dataframe.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.2
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 41.4.0
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2

@naomi172839
Copy link
Contributor

I believe that I fixed this issue, and agree that it does not appear to be intended behavior.

@arainboldt
Copy link

What's the status of this issue? I'm still getting it as of pandas.version == '1.0.3'

@arw2019
Copy link
Member

arw2019 commented Jul 31, 2020

Still an issue on 1.2

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3b1d4f1
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+10.g3b1d4f1ee
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

@simonjayhawkins simonjayhawkins added Error Reporting Incorrect or improved errors from pandas replace replace method labels Jul 31, 2020
@mroeschke mroeschke added the Bug label Jul 25, 2021
@zerothi
Copy link

zerothi commented Jun 22, 2023

This is still an issue in pandas 2.0.1.

import numpy as np
import pandas as pd

df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
print(df.astype({"foo": str}, errors="ignore").dtypes)

results in error:

KeyError: "Only a column name can be used for the key in a dtype mappings argument. 'foo' not found in columns."

@jreback
Copy link
Contributor

jreback commented Jun 22, 2023

@zerothi pandas is all volunteer and there is a large back log. the core team is happy to review any PRs submitted by the community to fix issue that are still open.

@zerothi
Copy link

zerothi commented Jun 22, 2023

@zerothi pandas is all volunteer and there is a large back log. the core team is happy to review any PRs submitted by the community to fix issue that are still open.

yeah, sorry about the blurp... guidance for solving this issue would be ideal, especially to those unfamiliar with the internals...

@nbro10
Copy link

nbro10 commented Sep 14, 2023

Why raise at all if column is not in the data frame? Shouldn't the errors parameter only be used to raise or ignore in case the type conversion actually fails (and that implies that a type conversion is made)?

For example, df.replace does not raise any error if you pass a dictionary with a column that does not exist in the data frame. I think that's the best behaviour, because you may want to use a dictionary, with multiple type conversions, with possibly different data frames, which have different columns, but some of them may overlap.

I think the best thing to do would be to introduce a parameter, in all functions that accept a dictionary to make conversions, that controls this behaviour (either raise or ignore, if the dictionary contains a mapping that does not exist in the data frame)

Btw, the title is wrong. It should have been "... does not work on df.astype()".

@aeisenbarth
Copy link

Someone may want to have it raise an error if the column is not found, because code assuming the column exist could have a bug.

In my case, want to ensure that in case a certain column is contained, it must have no other type than specified in astype, but it may legitimately not be contained.

There are two design considerations:

  • Silencing all errors with errors="ignore" might be too much. If some more serious error happens, I would want to know about it.
  • If a column is not found, I would not want the original dataframe returned (silently!) but all other available columns converted.

Workaround:
For anyone working with dataframes read from CSV, consider the pd.read_csv(…, dtypes={}) argument which does not raise when a column is not contained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas replace replace method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants