to_htlm + read_html small errors for floats despite formatter #14623

mverleg · 2016-11-09T11:08:07Z

Description

Storing a double precision float as HTML (to_html) and loading it back (read_html) loses precision, even though float_format has enough precision.

I saw that float_format is working, but even with way too many digits, it fails to recover the original number.

In contrast, just calling float or float64 on the string-formatted number works perfectly fine.

A small, complete example of the issue

from pandas import DataFrame, read_html


def floatformat(val):
	return '{:.16e}'.format(val)

x = 1.18047406523e+307
s = floatformat(x)
y = float(s)
assert x == y

frame = DataFrame(data=[[x]], columns=['a'])
pth = '/tmp/demo.dta'
with open(pth, 'w+') as fh:
	frame.to_html(fh, float_format=floatformat)
with open(pth, 'r') as fh:
	frame2 = read_html(fh)[0]

assert frame.a[0] == frame2.a[0], floatformat(frame.a[0] - frame2.a[0])
assert x == frame2.a[0]

Expected Output

Nothing

Actual Output

The last two assertions fail:

AssertionError: -4.9896007738367995e+291

Output of `pd.show_versions()`

same as #14618

The text was updated successfully, but these errors were encountered:

jreback · 2016-11-09T11:44:55Z

stringifying floats on round trip at this level of precision is certainly not guaranteed

see the docs http://pandas.pydata.org/pandas-docs/stable/io.html#specifying-method-for-floating-point-conversion

i doubt this keyword is actually passed thru though for html parsing - this seems a highly suspect usecase

mverleg · 2016-11-09T11:58:18Z

I don't really see why getting back the data saved can't be guaranteed, it's kind of the point of IO tools. I can understand if there's no time to do it, but that's different from it not being worth doing.

Unfortunately, the documentation about float_precision in the link doesn't apply to html.

jreback · 2016-11-09T12:06:19Z

i doubt this keyword is actually passed thru though for html parsing - this seems a highly suspect usecase

you can certainly submit a PR for this if you'd like
its actually pretty easy

mverleg changed the title ~~to_htlm + read_html small errors~~ to_htlm + read_html small errors for floats despite formatter Nov 9, 2016

jreback closed this as completed Nov 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_htlm + read_html small errors for floats despite formatter #14623

to_htlm + read_html small errors for floats despite formatter #14623

mverleg commented Nov 9, 2016

jreback commented Nov 9, 2016

mverleg commented Nov 9, 2016

jreback commented Nov 9, 2016

to_htlm + read_html small errors for floats despite formatter #14623

to_htlm + read_html small errors for floats despite formatter #14623

Comments

mverleg commented Nov 9, 2016

Description

A small, complete example of the issue

Expected Output

Actual Output

Output of pd.show_versions()

jreback commented Nov 9, 2016

mverleg commented Nov 9, 2016

jreback commented Nov 9, 2016

Output of `pd.show_versions()`