We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
# Your code here #problem: pandas to_numeric might give some errors when using coerce # it is adding digits at the end. import pandas as pd #minimal example data = [{'value': '.'}, {'value': '.'}, {'value': '.'}, {'value': '.'}, {'value': '243.164'}, {'value': '245.968'}, {'value': '249.585'}, {'value': '259.745'}, {'value': '265.742'}, {'value': '272.567'}] df = pd.DataFrame(data,columns=['value']) df.value = pd.to_numeric(df.value,errors='coerce') #looks as if all is good: df.value #but df.value[4] #this can be random: data2 = [{'value': '.'}, {'value': '.'}, {'value': '.'}, {'value': '.'}, {'value': '243.164'}, {'value': '245.968'}, {'value': '249.585'}, {'value': '259.745'}, {'value': '265.742'}, {'value': '272.567'}, {'value': '279.196'}, {'value': '280.366'}, {'value': '275.034'}, {'value': '271.351'}, {'value': '272.889'}, {'value': '270.627'}, {'value': '280.828'}, {'value': '290.383'}, {'value': '308.153'}, {'value': '319.945'}, {'value': '336.0'}, {'value': '344.09'}, {'value': '351.385'}, {'value': '356.178'}, {'value': '359.82'}, {'value': '361.03'}, {'value': '367.701'}, {'value': '380.812'}, {'value': '387.98'}, {'value': '391.749'}, {'value': '391.171'}, {'value': '385.97'}, {'value': '385.345'}, {'value': '386.121'}, {'value': '390.996'}, {'value': '399.734'}, {'value': '413.073'}, {'value': '421.532'}, {'value': '430.221'}, {'value': '437.092'}, {'value': '439.746'}, {'value': '446.01'}, {'value': '451.191'}, {'value': '460.463'}, {'value': '469.779'}, {'value': '472.025'}, {'value': '479.49'}, {'value': '474.864'}, {'value': '467.54'}, {'value': '471.978'}] #now 4, 36, and 47 are wrong, with different endings. df2 = pd.DataFrame(data2,columns=['value']) df2.value = pd.to_numeric(df2.value,errors='coerce') df2.value[[4,36,47]].to_list()
Current behavior: pandas to_numeric, when using errors='coerce' seem to randomly add decimals at the end of the number.
no decimals should be added at the end.
pd.show_versions()
[paste the output of pd.show_versions() here below this line]
commit : None python : 3.7.3.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : None.None
pandas : 0.25.3 numpy : 1.16.4 pytz : 2019.1 dateutil : 2.8.0 pip : 19.1.1 setuptools : 41.0.1 Cython : 0.29.12 pytest : 5.0.1 hypothesis : None sphinx : 2.1.2 blosc : None feather : None xlsxwriter : 1.1.8 lxml.etree : 4.3.4 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.1 IPython : 7.6.1 pandas_datareader: 0.7.4 bs4 : 4.7.1 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.3.4 matplotlib : 3.1.2 numexpr : 2.6.9 odfpy : None openpyxl : 2.6.2 pandas_gbq : None pyarrow : 0.15.1 pytables : None s3fs : None scipy : 1.2.1 sqlalchemy : 1.3.5 tables : 3.5.2 xarray : 0.12.3 xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.1.8
The text was updated successfully, but these errors were encountered:
being more precise, I am getting:
df2.value[[4,36,47]].to_list()
to be [243.16400000000002, 413.07300000000004, 474.86400000000003]
instead of
[243.164, 413.073, 474.864]
Also, downcasting gives me different behavior in the two datasets:
#smaller dataset, data pd.to_numeric(df.value,errors='coerce',downcast="float")[4] #243.164 pd.to_numeric(df.value,errors='coerce',downcast="integer")[4] #243.16400000000002 #bigger dataset, data2 #downcast to integer works in the bigger dataset (dataset2) pd.to_numeric(df2.value,errors='coerce',downcast="float")[[4,36,47]].to_list() #output: [243.16400146484375, 413.072998046875, 474.864013671875] pd.to_numeric(df2.value,errors='coerce',downcast="integer")[[4,36,47]].to_list() #output: [243.16400000000002, 413.07300000000004, 474.86400000000003]
Sorry, something went wrong.
I see similar behavior for example:
float('0.167') 0.167 type(float('0.167')) <class 'float'>
vs
pd.to_numeric('0.167') 0.16699999999999998 type(pd.to_numeric('0.167')) <class 'numpy.float64'>
I guess output of to_numeric should be 0.167 like the float('0.167')
to_numeric
0.167
float('0.167')
Bug is in pd.to_numeric:
pd.to_numeric
In [2]: s='243.164' In [3]: float(s), pd.to_numeric(s) Out[3]: (243.164, 243.16400000000002)
We are using internally xstrtod instead of precise_xstrtod.
xstrtod
precise_xstrtod
Successfully merging a pull request may close this issue.
Code Sample, a copy-pastable example if possible
Problem description
Current behavior: pandas to_numeric, when using errors='coerce' seem to randomly add
decimals at the end of the number.
Expected Output
no decimals should be added at the end.
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None
pandas : 0.25.3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: 0.7.4
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.2
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.5.2
xarray : 0.12.3
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
The text was updated successfully, but these errors were encountered: