Skip to content

BUG: astype ignore errors not working for categorical/StringArray #35471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
traubms opened this issue Jul 29, 2020 · 2 comments
Closed
2 of 3 tasks

BUG: astype ignore errors not working for categorical/StringArray #35471

traubms opened this issue Jul 29, 2020 · 2 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@traubms
Copy link

traubms commented Jul 29, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Problem description

It seems to be ignoring errors='ignore' kwarg. The behavior in 0.25.3 is expected.

Code Sample, a copy-pastable example

import pandas as pd
pd.__version__
# '1.0.5'

pd.Series(pd.Categorical(['A', 'B'])).astype(float, errors='ignore')
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)
# <ipython-input-4-3d39cd9c1cdd> in <module>
# ----> 1 pd.Series(pd.Categorical(['A', 'B'])).astype(float, errors='ignore')
#
# ...
# ValueError: could not convert string to float: 'A'

Expected Output

import pandas as pd
pd.__version__
# '0.25.3'

pd.Series(pd.Categorical(['A', 'B'])).astype(float, errors='ignore')
# 0    A
# 1    B
# dtype: category
# Categories (2, object): [A, B]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1062.4.1.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : None
pytest : 6.0.0
hypothesis : None
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.5.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 6.0.0
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.18
tables : None
tabulate : 0.8.7
xarray : None
xlrd : None
xlwt : None
xlsxwriter : 1.2.9
numba : 0.50.1

@traubms traubms added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 29, 2020
@simonjayhawkins
Copy link
Member

Thanks @traubms for the report. #30324 is possibly related.

it appears that the issue is not specific to Categorical so i'll label as Extension Array for now.

>>> pd.__version__
'1.2.0.dev0+10.g3b1d4f1ee'
>>>
>>> pd.Series(["A", "B"], dtype="string").astype(float, errors="ignore")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\generic.py", line 5537, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
  File "C:\Users\simon\pandas\pandas\core\internals\managers.py", line 567, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "C:\Users\simon\pandas\pandas\core\internals\managers.py", line 396, in apply
    applied = getattr(b, f)(**kwargs)
  File "C:\Users\simon\pandas\pandas\core\internals\blocks.py", line 570, in astype
    values = self.values.astype(dtype)
  File "C:\Users\simon\pandas\pandas\core\arrays\string_.py", line 292, in astype
    return super().astype(dtype, copy)
  File "C:\Users\simon\pandas\pandas\core\arrays\base.py", line 460, in astype
    return np.array(self, dtype=dtype, copy=copy)
  File "C:\Users\simon\pandas\pandas\core\arrays\numpy_.py", line 211, in __array__
    return np.asarray(self._ndarray, dtype=dtype)
  File "C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\numpy\core\_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'A'
>>>

further investigation and PRs welcome.

@simonjayhawkins simonjayhawkins changed the title BUG: astype ignore errors not working for categorical BUG: astype ignore errors not working for categorical/StringArray Jul 30, 2020
@simonjayhawkins simonjayhawkins added ExtensionArray Extending pandas with custom dtypes or arrays. Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 30, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jul 30, 2020
@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Jul 30, 2020
@dsaxton dsaxton modified the milestones: Contributions Welcome, 1.1.2 Aug 29, 2020
@simonjayhawkins
Copy link
Member

closed in #35979

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

3 participants