Skip to content

to_numeric with errors = "coerce" is adding digits at the end #31364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jjotterson opened this issue Jan 27, 2020 · 3 comments · Fixed by #36149
Closed

to_numeric with errors = "coerce" is adding digits at the end #31364

jjotterson opened this issue Jan 27, 2020 · 3 comments · Fixed by #36149
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@jjotterson
Copy link

Code Sample, a copy-pastable example if possible

# Your code here
#problem: pandas to_numeric might give some errors when using coerce 
# it is adding digits at the end.

import pandas as pd


#minimal example
data = [{'value': '.'}, {'value': '.'}, {'value': '.'}, {'value': '.'}, {'value': '243.164'}, {'value': '245.968'}, {'value': '249.585'}, {'value': '259.745'}, {'value': '265.742'}, {'value': '272.567'}]
df = pd.DataFrame(data,columns=['value'])

df.value = pd.to_numeric(df.value,errors='coerce')

#looks as if all is good:
df.value 

#but
df.value[4]


#this can be random:
data2 = [{'value': '.'}, {'value': '.'}, {'value': '.'}, {'value': '.'}, {'value': '243.164'}, {'value': '245.968'}, {'value': '249.585'}, {'value': '259.745'}, {'value': '265.742'}, {'value': '272.567'}, {'value': '279.196'}, {'value': '280.366'}, {'value': '275.034'}, {'value': '271.351'}, {'value': '272.889'}, {'value': '270.627'}, {'value': '280.828'}, {'value': '290.383'}, {'value': '308.153'}, {'value': '319.945'}, {'value': '336.0'}, {'value': '344.09'}, {'value': '351.385'}, {'value': '356.178'}, {'value': '359.82'}, {'value': '361.03'}, {'value': '367.701'}, {'value': '380.812'}, {'value': '387.98'}, {'value': '391.749'}, {'value': '391.171'}, {'value': '385.97'}, {'value': '385.345'}, {'value': '386.121'}, {'value': '390.996'}, {'value': '399.734'}, {'value': '413.073'}, {'value': '421.532'}, {'value':
'430.221'}, {'value': '437.092'}, {'value': '439.746'}, {'value': '446.01'}, {'value': '451.191'}, {'value': '460.463'}, {'value': '469.779'}, {'value': '472.025'}, {'value': '479.49'}, {'value': '474.864'}, {'value': '467.54'}, {'value': '471.978'}]


#now 4, 36, and 47 are wrong, with different endings.
df2 = pd.DataFrame(data2,columns=['value'])

df2.value = pd.to_numeric(df2.value,errors='coerce')


df2.value[[4,36,47]].to_list()

Problem description

Current behavior: pandas to_numeric, when using errors='coerce' seem to randomly add
decimals at the end of the number.

Expected Output

no decimals should be added at the end.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None

pandas : 0.25.3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: 0.7.4
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.2
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.5.2
xarray : 0.12.3
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

@jjotterson
Copy link
Author

jjotterson commented Jan 27, 2020

being more precise, I am getting:

df2.value[[4,36,47]].to_list()

to be
[243.16400000000002, 413.07300000000004, 474.86400000000003]

instead of

[243.164, 413.073, 474.864]

Also, downcasting gives me different behavior in the two datasets:

#smaller dataset, data
pd.to_numeric(df.value,errors='coerce',downcast="float")[4]
#243.164
pd.to_numeric(df.value,errors='coerce',downcast="integer")[4]
#243.16400000000002


#bigger dataset, data2
#downcast to integer works in the bigger dataset (dataset2)
pd.to_numeric(df2.value,errors='coerce',downcast="float")[[4,36,47]].to_list()
#output: [243.16400146484375, 413.072998046875, 474.864013671875]
pd.to_numeric(df2.value,errors='coerce',downcast="integer")[[4,36,47]].to_list()
#output: [243.16400000000002, 413.07300000000004, 474.86400000000003]





@VitekVlcek-Broadcom
Copy link

I see similar behavior for example:

float('0.167')
0.167
type(float('0.167'))
<class 'float'>

vs

pd.to_numeric('0.167')
0.16699999999999998
type(pd.to_numeric('0.167'))
<class 'numpy.float64'>

I guess output of to_numeric should be 0.167 like the float('0.167')

@Dr-Irv Dr-Irv added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Sep 5, 2020
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 5, 2020

Bug is in pd.to_numeric:

In [2]: s='243.164'

In [3]: float(s), pd.to_numeric(s)
Out[3]: (243.164, 243.16400000000002)

We are using internally xstrtod instead of precise_xstrtod.

@jreback jreback added this to the 1.1.2 milestone Sep 6, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.2 Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants