Skip to content

Convert type nullable int <-> nullable string #31839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tritemio opened this issue Feb 9, 2020 · 2 comments
Closed

Convert type nullable int <-> nullable string #31839

tritemio opened this issue Feb 9, 2020 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@tritemio
Copy link

tritemio commented Feb 9, 2020

Code Sample, a copy-pastable example if possible

This code raises an error:

>>> s = pd.Series([0, pd.NA], dtype='Int8')
>>> s.astype('string')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-e5922a9c01ed> in <module>
----> 1 s.astype('string')

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5696         else:
   5697             # else, only a single dtype is given
-> 5698             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors)
   5699             return self._constructor(new_data).__finalize__(self)
   5700 

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    580 
    581     def astype(self, dtype, copy: bool = False, errors: str = "raise"):
--> 582         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    583 
    584     def convert(self, **kwargs):

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
    440                 applied = b.apply(f, **kwargs)
    441             else:
--> 442                 applied = getattr(b, f)(**kwargs)
    443             result_blocks = _extend_blocks(applied, result_blocks)
    444 

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    605         if self.is_extension:
    606             # TODO: Should we try/except this astype?
--> 607             values = self.values.astype(dtype)
    608         else:
    609             if issubclass(dtype.type, str):

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/arrays/integer.py in astype(self, dtype, copy)
    463             kwargs = {}
    464 
--> 465         data = self.to_numpy(dtype=dtype, **kwargs)
    466         return astype_nansafe(data, dtype, copy=False)
    467 

/opt/tljh/user/envs/py37-sf/lib/python3.7/site-packages/pandas/core/arrays/masked.py in to_numpy(self, dtype, copy, na_value)
    130                 )
    131             # don't pass copy to astype -> always need a copy since we are mutating
--> 132             data = self._data.astype(dtype)
    133             data[self._mask] = na_value
    134         else:

TypeError: data type not understood

Problem description

Converting a nullable int to nullable string requires a double conversion:

s = pd.Series([0, 1], dtype='Int8')
s.astype(str).astype('string')

Likewise, converting a nullable string to nullable int requires two steps:

s = pd.Series(['0', '1'], dtype='string')
s.astype('int8').astype('Int8')

Moreover, if the nullable string series has NAs, converting to a nullable int becomes much harder (I don't know if there is a simpler way):

s = pd.Series(['0', pd.NA], dtype='string')
s.astype('object').replace(pd.NA, np.nan).astype('float64').astype('Int8')

Expected Output

I would be nice to directly convert from nullable int to nullable string and vice versa in one step.

I'd like for all the following conversions to work:

s = pd.Series([0, 1], dtype='Int8')
s.astype('string')   # currently raises
s = pd.Series(['0', '1'], dtype='string')
s.astype('Int8')  # currently raises
s = pd.Series(['0', pd.NA], dtype='string')
s.astype('Int8')  # currently raises

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-1058-aws
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200119
Cython : 0.29.13
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
tabulate : 0.8.3
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.45.1

@TomAugspurger
Copy link
Contributor

Thanks. Duplicate of #31204 I think.

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Feb 10, 2020
@TomAugspurger TomAugspurger added this to the No action milestone Feb 10, 2020
@tritemio
Copy link
Author

@TomAugspurger, sorry I couldn't find the previous issue #31204. I'll comment there to add the roundtrip argument (Int -> string -> Int).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants