Skip to content

"errors" parameter has no effect on pd.DataFrame.astype if a dictionary is passed in as the dtype argument #25905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nandrea1 opened this issue Mar 28, 2019 · 2 comments
Labels
DataFrame DataFrame data structure Error Reporting Incorrect or improved errors from pandas

Comments

@nandrea1
Copy link

Code Sample, a copy-pastable example if possible

Given the code sample below:

data_df = pd.DataFrame([{'col_a': '1', 'col_b': '16.5%', 'col_c': 'test'},
                        {'col_a': '2.2', 'col_b': '15.3', 'col_c': 'another_test'}
                       ]) 
type_dict = {'col_a': 'float64', 'col_b': 'float64', 'col_c': 'object}
data_df = data_df.astype(dtype=type_dict, errors='ignore')

Problem description

Pandas will raise the error ValueError: could not convert string to float: '16.5%' despite the fact that the errors='ignore' parameter was passed in. This is probable due to the implementation of astype, as can be seen in pandas.core.generic.py lines 5658 - 5680

        if is_dict_like(dtype):
            if self.ndim == 1:  # i.e. Series
                if len(dtype) > 1 or self.name not in dtype:
                    raise KeyError('Only the Series name can be used for '
                                   'the key in Series dtype mappings.')
                new_type = dtype[self.name]
                return self.astype(new_type, copy, errors, **kwargs)
            elif self.ndim > 2:
                raise NotImplementedError(
                    'astype() only accepts a dtype arg of type dict when '
                    'invoked on Series and DataFrames. A single dtype must be '
                    'specified when invoked on a Panel.'
                )
            for col_name in dtype.keys():
                if col_name not in self:
                    raise KeyError('Only a column name can be used for the '
                                   'key in a dtype mappings argument.')
            results = []
            for col_name, col in self.iteritems():
                if col_name in dtype:
                    results.append(col.astype(dtype[col_name], copy=copy))
                else:
                    results.append(results.append(col.copy() if copy else col))

Expected Output

One would expect the pd.DataFrame to successfully type cast, leaving the col_b as an np.object type

@WillAyd
Copy link
Member

WillAyd commented Mar 28, 2019

I believe this is what #25888 is trying to solve

@gfyoung gfyoung added Error Reporting Incorrect or improved errors from pandas DataFrame DataFrame data structure labels Mar 28, 2019
johnklehm added a commit to johnklehm/pandas that referenced this issue Mar 29, 2019
…ck for the absense of the exception we'd normally throw. Assert the resulting dataframe against a reference dataframe. Move the test case to test_dtypes.
@WillAyd
Copy link
Member

WillAyd commented Apr 1, 2019

Closed via #25888

@WillAyd WillAyd closed this as completed Apr 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

No branches or pull requests

3 participants