Skip to content

BUG/PY3? astype("string") fails for python 3 #15010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Dec 29, 2016 · 4 comments
Closed

BUG/PY3? astype("string") fails for python 3 #15010

jorisvandenbossche opened this issue Dec 29, 2016 · 4 comments

Comments

@jorisvandenbossche
Copy link
Member

From the categorical example docs, with python 2.7:

In [1]: s = pd.Series(["a","b","c","a"])

In [2]: s2 = s.astype('category')

In [3]: s2.astype('string')
Out[3]: 
0    a
1    b
2    c
3    a
dtype: object

However, with python 3, this fails:

In [74]: s = pd.Series(["a","b","c","a"])

In [75]: s2 = s.astype('category')

In [76]: s2.astype('string')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-76-70b55f934dfe> in <module>()
----> 1 s2.astype('string')

/home/joris/scipy/pandas/pandas/core/generic.py in astype(self, dtype, copy, raise_on_error, **kwargs)
   3179             conversion, with unconvertible values becoming NaT.
   3180         convert_numeric : boolean, default False
-> 3181             If True, attempt to coerce to numbers (including strings), with
   3182             unconvertible values becoming NaN.
   3183         convert_timedeltas : boolean, default True

/home/joris/scipy/pandas/pandas/core/internals.py in astype(self, dtype, **kwargs)
   3188 
   3189     def astype(self, dtype, **kwargs):
-> 3190         return self.apply('astype', dtype=dtype, **kwargs)
   3191 
   3192     def convert(self, **kwargs):

/home/joris/scipy/pandas/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3055 
   3056             kwargs['mgr'] = self
-> 3057             applied = getattr(b, f)(**kwargs)
   3058             result_blocks = _extend_blocks(applied, result_blocks)
   3059 

/home/joris/scipy/pandas/pandas/core/internals.py in astype(self, dtype, copy, raise_on_error, values, **kwargs)
    459                **kwargs):
    460         return self._astype(dtype, copy=copy, raise_on_error=raise_on_error,
--> 461                             values=values, **kwargs)
    462 
    463     def _astype(self, dtype, copy=False, raise_on_error=True, values=None,

/home/joris/scipy/pandas/pandas/core/internals.py in _astype(self, dtype, copy, raise_on_error, values, klass, mgr)
   2158             values = self.values
   2159         else:
-> 2160             values = np.asarray(self.values).astype(dtype, copy=False)
   2161 
   2162         if copy:

TypeError: data type "string" not understood

Should this also work? (or should be just update the docs) In any case, consistency would be nice here.

The root cause is probably that this difference also exists in numpy's astype.

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-53-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.0+270.gc72f297
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0rc2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: 0.0.7
pandas_datareader: None

</details>
@chris-b1
Copy link
Contributor

xref numpy/numpy#6023 (may be other open issues)

@jorisvandenbossche
Copy link
Member Author

OK, maybe we should just leave it as a numpy issue then.
I am updating the docs in any case to have them py3 compat for now.

@jorisvandenbossche
Copy link
Member Author

Doc issue is addressed in #15011

@TomAugspurger
Copy link
Contributor

Looks like we're good here.

@TomAugspurger TomAugspurger added this to the No action milestone Jun 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants