-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Series of object/strings cannot be converted to Int64Dtype() #28599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't know if we want to support that automatically, since there's some ambiguity: Converting I'd recommend using |
The issue is that with missing data, Moreover, as far as I can see, shouldn't |
Some follow up question:
Please tell me if I really did'nt understand the issue and am out of my depth. |
we don’t convert to float first to_numeric is the workhorse astype doesn’t have any options meaning all values must be convertible like in numpy |
Doesn't appear we have much appetite to support this. Thanks for the suggestion but we'd recommend using |
I must say I disagree on both points @mroeschke.
I think this should be reopened. |
Sorry for the late reply. I think I explained my issue poorly. On my system I also have int64 by default. However, the issue is that int64 cannot hold missing/NaN values. That's why I need to use IntD64 (with the capital I), the new data type that allows https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html This is also why My current workaround is to convert to |
What I mean is |
@mar-ses if you like to contribute tests / patch to .to_numeric that would be greatness; we would / should support nullable integer type conversion there |
I've never contributed to these big projects, and I assume I would need to understand the internals and the standard way these things are done inside pandas, so any recommendations on where to start reading etc...? Additionall, would it not also make sense to do it with |
I'm looking into it, wouldn't minded doing this then. So looking at https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/lib.pyx As far as I can figure out, if it can't immediately convert to the normal For this to work then, it would also need to have a nullable integer array, but since this is done in cython, is that even possible? Is there a version of this nullable integer array in cython? Or otherwise, can the following object hold a Or should I create another array like:
|
once this is merged (soon); #27335 this will relatively straightforward to patch |
I hope I didn't commit a faux pas. Since the anticipated merge recently took place, patching this issue is no longer blocked. I was trying to be helpful by drawing attention to this fact as a "bump". Sorry if that came across as pushy/annoying. I'm newly active on GitHub and still figuring out the social norms. |
I'm also newby here. So I looked at this other issue a bit (the thing that's getting merged), and won't the update to |
@maresb there are 3000 issues and all volunteer |
@jreback yes, that is so obvious that I'm surprised that you feel the need to point it out to me. I'd love to contribute, but it'll be several weeks before that's even possible. In case you have a problem with my previous comment, I would appreciate some constructive feedback. I thought that I was being helpful and polite by alerting @mar-ses, since he previously expressed interest in contributing. |
Edited to add information.
Code Sample, a copy-pastable example if possible
Problem description
Currently, the conversion of object dtypes (containing strings) to Int64 doesn't work, even though it should be able to. It produces a long error (see at the end).
Important to note: the above is trying to convert to
Int64
with the capital I. Those are the new nullable-integer arrays that got added topython
.pandas
seems to support them, yet I think something insideastype
wasn't update to reflect that.In essence, the above should work; there is no reason why it should fail and it's quite simply a bug (in answer to some comments). Moreover,
to_numeric
is not a sufficient replacement here; it doesn't convert toInt64
when there are missing datatypes, instead it converts tofloat
automatically (this is actually a non-trivial problem when dealing with long integer identifiers, such as GAIA target identifiers).Traceback:
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.16-041816-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 0.25.1
numpy : 1.14.3
pytz : 2018.4
dateutil : 2.7.3
pip : 19.1.1
setuptools : 39.1.0
Cython : 0.28.2
pytest : 3.5.1
hypothesis : None
sphinx : 1.7.4
blosc : None
feather : None
xlsxwriter : 1.0.4
lxml.etree : 4.2.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 6.4.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.2.1
matplotlib : 3.1.1
numexpr : 2.6.5
odfpy : None
openpyxl : 2.5.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : 1.2.7
tables : 3.4.3
xarray : None
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : 1.0.4
The text was updated successfully, but these errors were encountered: