Convert type nullable int <-> nullable string #31839

tritemio · 2020-02-09T22:41:20Z

Code Sample, a copy-pastable example if possible

This code raises an error:

>>> s = pd.Series([0, pd.NA], dtype='Int8')
>>> s.astype('string')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-e5922a9c01ed> in <module>
----> 1 s.astype('string')

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5696         else:
   5697             # else, only a single dtype is given
-> 5698             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors)
   5699             return self._constructor(new_data).__finalize__(self)
   5700 

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    580 
    581     def astype(self, dtype, copy: bool = False, errors: str = "raise"):
--> 582         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    583 
    584     def convert(self, **kwargs):

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
    440                 applied = b.apply(f, **kwargs)
    441             else:
--> 442                 applied = getattr(b, f)(**kwargs)
    443             result_blocks = _extend_blocks(applied, result_blocks)
    444 

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    605         if self.is_extension:
    606             # TODO: Should we try/except this astype?
--> 607             values = self.values.astype(dtype)
    608         else:
    609             if issubclass(dtype.type, str):

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/arrays/integer.py in astype(self, dtype, copy)
    463             kwargs = {}
    464 
--> 465         data = self.to_numpy(dtype=dtype, **kwargs)
    466         return astype_nansafe(data, dtype, copy=False)
    467 

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/arrays/masked.py in to_numpy(self, dtype, copy, na_value)
    130                 )
    131             # don't pass copy to astype -> always need a copy since we are mutating
--> 132             data = self._data.astype(dtype)
    133             data[self._mask] = na_value
    134         else:

TypeError: data type not understood

Problem description

Converting a nullable int to nullable string requires a double conversion:

s = pd.Series([0, 1], dtype='Int8')
s.astype(str).astype('string')

Likewise, converting a nullable string to nullable int requires two steps:

s = pd.Series(['0', '1'], dtype='string')
s.astype('int8').astype('Int8')

Moreover, if the nullable string series has NAs, converting to a nullable int becomes much harder (I don't know if there is a simpler way):

s = pd.Series(['0', pd.NA], dtype='string')
s.astype('object').replace(pd.NA, np.nan).astype('float64').astype('Int8')

Expected Output

I would be nice to directly convert from nullable int to nullable string and vice versa in one step.

I'd like for all the following conversions to work:

s = pd.Series([0, 1], dtype='Int8')
s.astype('string')   # currently raises

s = pd.Series(['0', '1'], dtype='string')
s.astype('Int8')  # currently raises

s = pd.Series(['0', pd.NA], dtype='string')
s.astype('Int8')  # currently raises

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-1058-aws
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200119
Cython : 0.29.13
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
tabulate : 0.8.3
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.45.1

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-02-10T13:54:22Z

Thanks. Duplicate of #31204 I think.

tritemio · 2020-02-10T14:08:36Z

@TomAugspurger, sorry I couldn't find the previous issue #31204. I'll comment there to add the roundtrip argument (Int -> string -> Int).

TomAugspurger closed this as completed Feb 10, 2020

TomAugspurger added the Duplicate Report Duplicate issue or pull request label Feb 10, 2020

TomAugspurger added this to the No action milestone Feb 10, 2020

tritemio mentioned this issue Feb 10, 2020

convert numeric column to dedicated pd.StringDtype() #31204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert type nullable int <-> nullable string #31839

Convert type nullable int <-> nullable string #31839

tritemio commented Feb 9, 2020

INSTALLED VERSIONS

TomAugspurger commented Feb 10, 2020

tritemio commented Feb 10, 2020

Convert type nullable int <-> nullable string #31839

Convert type nullable int <-> nullable string #31839

Comments

tritemio commented Feb 9, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Feb 10, 2020

tritemio commented Feb 10, 2020

Output of `pd.show_versions()`