BUG: problem converting between new dtypes ('string' and 'Int8' / 'Int64') #34759

SultanOrazbayev · 2020-06-14T00:48:14Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.

Attempts to convert between the new nullable integer and string dtypes result in an error. This might be a different manifestation of the same underlying problem as in: #32073 and #32234

import pandas as pd
import numpy as np

test = {
    'a': [0, 1, np.nan],
    'b': ['0', '1', np.nan]
}

df = pd.DataFrame(test).astype({'a': 'Int8', 'b': 'string'})
df.b.astype('Int8')
# TypeError: data type not understood

Here's the traceback:

--------------------------------------------------------- TypeError Traceback (most recent call last) in ----> 1 df.b.astype('Int8')

~/myenv/lib/python3.7/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
5696 else:
5697 # else, only a single dtype is given
-> 5698 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors)
5699 return self._constructor(new_data).finalize(self)
5700

~/myenv/lib/python3.7/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
580
581 def astype(self, dtype, copy: bool = False, errors: str = "raise"):
--> 582 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
583
584 def convert(self, **kwargs):

~/myenv/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
440 applied = b.apply(f, **kwargs)
441 else:
--> 442 applied = getattr(b, f)(**kwargs)
443 result_blocks = _extend_blocks(applied, result_blocks)
444

~/myenv/lib/python3.7/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
605 if self.is_extension:
606 # TODO: Should we try/except this astype?
--> 607 values = self.values.astype(dtype)
608 else:
609 if issubclass(dtype.type, str):

~/myenv/lib/python3.7/site-packages/pandas/core/arrays/string_.py in astype(self, dtype, copy)
258 return self.copy()
259 return self
--> 260 return super().astype(dtype, copy)
261
262 def _reduce(self, name, skipna=True, **kwargs):

~/myenv/lib/python3.7/site-packages/pandas/core/arrays/base.py in astype(self, dtype, copy)
448 NumPy ndarray with 'dtype' for its dtype.
449 """
--> 450 return np.array(self, dtype=dtype, copy=copy)
451
452 def isna(self) -> ArrayLike:

TypeError: data type not understood

Expected Output

It should be possible to convert between different dtypes.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-957.12.1.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.0.4
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.1.post20200529
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fastparquet : 0.4.0
gcsfs : None
lxml.etree : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.15.1
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

jreback · 2020-06-14T00:59:15Z

pls try on master ; these are fixed

SultanOrazbayev · 2020-06-14T01:35:23Z

Thanks, it works on the master, looking forward to the new release!

SultanOrazbayev added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 14, 2020

SultanOrazbayev closed this as completed Jun 14, 2020

bashtage removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: problem converting between new dtypes ('string' and 'Int8' / 'Int64') #34759

BUG: problem converting between new dtypes ('string' and 'Int8' / 'Int64') #34759

SultanOrazbayev commented Jun 14, 2020

INSTALLED VERSIONS

jreback commented Jun 14, 2020

SultanOrazbayev commented Jun 14, 2020 •

edited

Loading

BUG: problem converting between new dtypes ('string' and 'Int8' / 'Int64') #34759

BUG: problem converting between new dtypes ('string' and 'Int8' / 'Int64') #34759

Comments

SultanOrazbayev commented Jun 14, 2020

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Jun 14, 2020

SultanOrazbayev commented Jun 14, 2020 • edited Loading

Output of `pd.show_versions()`

SultanOrazbayev commented Jun 14, 2020 •

edited

Loading