Skip to content

pd.to_numeric(..., errors="coerce") failing silently when strings contain "uint64" #32394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
buhrmann opened this issue Mar 2, 2020 · 4 comments · Fixed by #32541 or #32560
Closed
Assignees
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@buhrmann
Copy link

buhrmann commented Mar 2, 2020

Problem description

When trying to coerce strings to numeric values using to_numeric(), the occurrence of the substring "uint64" (but not any other dtype-like substring it seems) leads to silent failure to coerce.

strs = ["32", "64", "uint32", "float64", "sdnfonsdf uint32 knsdf", "sdnfonsdf uint64 knsdf", "uint64"]
print([pd.to_numeric(s, errors="coerce") for s in strs])
pd.to_numeric(pd.Series(["32", "64", "uint64"]), errors="coerce")
[32, 64, nan, nan, nan, 'sdnfonsdf uint64 knsdf', 'uint64']

0        32
1        64
2    uint64
dtype: object

Expected Output

[32, 64, nan, nan, nan, nan, nan]

0        32.0
1        64.0
2        NaN
dtype: float64

Seems to fail equally in 0.25.3 and 1.0...

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None

pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200119
Cython : None
pytest : 5.3.4
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.14.1
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

@buhrmann buhrmann changed the title pd.to_numeric(..., errors="coerce") failing silently when strings contain "uint64" pd.to_numeric(..., errors="coerce") failing silently when strings contain "uint64" Mar 2, 2020
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 2, 2020

A bit simpler reproducer:

In [9]: b = np.array(['uint64'], dtype=object)

In [10]: pd._libs.lib.maybe_convert_numeric(b, set(), coerce_numeric=True)

We explicitly check for uint64 in the error message at

elif "uint64" in str(err): # Exception from check functions.
, which goes back a ways (at least before #22219, didn't look at the blame past that).

It'd be great if you could do more investigation here.

@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Mar 2, 2020
@TomAugspurger TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 2, 2020
@roberthdevries
Copy link
Contributor

take

@roberthdevries
Copy link
Contributor

This seems to be a relic of really old type inference mechanics which were present in pandas/src/inference.pyx (commit 17d7ddb by gfyoung).
That code used some check functions for uint64 types (now removed) that would raise exceptions with the string "uint64". As this code is now no longer present, the check for the presence of the string in the exception text can be removed as well.
I did not spend time to figure out in which revision the checks were removed without removing the string check.

@jreback
Copy link
Contributor

jreback commented Mar 9, 2020

reopening as going to revert this -
needs more tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
4 participants