Skip to content

BUG: Respect value of raise_on_error #14877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

tomderuijter
Copy link

@tomderuijter tomderuijter commented Dec 14, 2016

Code Sample, a copy-pastable example if possible

Pasteable snippet.

import numpy as np
import pandas as pd
x = pd.DataFrame(data=[[1,2],[3,4]], columns=['A','B'])
x.astype({'C': np.int32}, raise_on_error=False)

Output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tomdr/VirtualEnvs/work-py3/lib/python3.5/site-packages/pandas/core/generic.py", line 3041, in astype
    raise KeyError('Only a column name can be used for the '
KeyError: 'Only a column name can be used for the key in a dtype mappings argument.'

Problem description

The method argument raise_on_error of DataFrame.astype suggests there is the option to ignore errors caused by mismatch in column names. Currently, an exception is raised regardless of the value of this argument. This is fixed in this pull request.

Before: https://github.com/pandas-dev/pandas/blob/v0.19.1/pandas/core/generic.py#L3041

Expected Output

An exception should not be raised.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 21.0.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: 1.4.8
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.14
pymysql: 0.7.9.None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

@tomderuijter tomderuijter changed the title Respect value of raise_on_error BUG: Respect value of raise_on_error Dec 14, 2016
@sinhrks sinhrks added the Error Reporting Incorrect or improved errors from pandas label Dec 14, 2016
@sinhrks
Copy link
Member

sinhrks commented Dec 14, 2016

I understand the keyword is to ignore dtype conversion error, not input validation. So I think the documentation should be fixed to clarify this.

If we're ignoring all kinds of errors, you should fix KeyError raised from Series.

@sinhrks sinhrks added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 14, 2016
@jorisvandenbossche
Copy link
Member

I agree with @sinhrks that the current meaning of the raise_on_error keyword is to raise/ignore conversion errors, not errors for column names.

The keyword is renamed in #14967, so let's making this clearer in the docs there.

@tomderuijter The use case for ignoring keys in the dict that are not present in the column names is also certainly a valid use case. So we could discuss whether we want to support this in a certain way (similar requests came up for drop to drop columns I think). Welcome to open a new issue about this to discuss!

Copy link

@ratman ratman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting the problematic key's name in the exception message would make one's life much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants