Skip to content

to_htlm + read_html small errors for floats despite formatter #14623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mverleg opened this issue Nov 9, 2016 · 3 comments
Closed

to_htlm + read_html small errors for floats despite formatter #14623

mverleg opened this issue Nov 9, 2016 · 3 comments

Comments

@mverleg
Copy link

mverleg commented Nov 9, 2016

Description

Storing a double precision float as HTML (to_html) and loading it back (read_html) loses precision, even though float_format has enough precision.

I saw that float_format is working, but even with way too many digits, it fails to recover the original number.

In contrast, just calling float or float64 on the string-formatted number works perfectly fine.

A small, complete example of the issue

from pandas import DataFrame, read_html


def floatformat(val):
	return '{:.16e}'.format(val)

x = 1.18047406523e+307
s = floatformat(x)
y = float(s)
assert x == y

frame = DataFrame(data=[[x]], columns=['a'])
pth = '/tmp/demo.dta'
with open(pth, 'w+') as fh:
	frame.to_html(fh, float_format=floatformat)
with open(pth, 'r') as fh:
	frame2 = read_html(fh)[0]

assert frame.a[0] == frame2.a[0], floatformat(frame.a[0] - frame2.a[0])
assert x == frame2.a[0]

Expected Output

Nothing

Actual Output

The last two assertions fail:

AssertionError: -4.9896007738367995e+291

Output of pd.show_versions()

same as #14618

@mverleg mverleg changed the title to_htlm + read_html small errors to_htlm + read_html small errors for floats despite formatter Nov 9, 2016
@jreback
Copy link
Contributor

jreback commented Nov 9, 2016

stringifying floats on round trip at this level of precision is certainly not guaranteed

see the docs http://pandas.pydata.org/pandas-docs/stable/io.html#specifying-method-for-floating-point-conversion

i doubt this keyword is actually passed thru though for html parsing - this seems a highly suspect usecase

@jreback jreback closed this as completed Nov 9, 2016
@mverleg
Copy link
Author

mverleg commented Nov 9, 2016

I don't really see why getting back the data saved can't be guaranteed, it's kind of the point of IO tools. I can understand if there's no time to do it, but that's different from it not being worth doing.

Unfortunately, the documentation about float_precision in the link doesn't apply to html.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2016

i doubt this keyword is actually passed thru though for html parsing - this seems a highly suspect usecase

you can certainly submit a PR for this if you'd like
its actually pretty easy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants