-
Notifications
You must be signed in to change notification settings - Fork 125
Floats can lose precision when loading to BigQuery #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@max-sixty I was wondering if you might be able to provide some help / guidance here? |
Thanks @dkapitan |
Would |
According to https://docs.python.org/3.6/library/string.html#formatspec
I think that would result in too much rounding on some systems. |
According to https://en.wikipedia.org/wiki/IEEE_754#Character_representation 17 digits are precision are required to preserve the original binary value. 16 digits was not enough in my testing of #336 |
Uh oh!
There was an error while loading. Please reload this page.
The float precision is set here: https://github.com/pydata/pandas-gbq/blob/d251db03b159447331ac9ae63e13d295d75bad70/pandas_gbq/load.py#L22
This is insufficient to represent all 64 bit floats without losing precision. For example 26/59 should be represented as
0.4406779661016949
but under this it is represented as0.440677966101695
.This was added intentionally here to fix a different issue but it causes us some issues as we need perfect reconciliation between systems. It seems like it should be possible to get the best of both worlds and output the correct number of digits in all cases.
The original suggestion was to use
%g
but this was changed to%.15g
– it's not clear to me what the rationale is for that, it seems like%g
is strictly better but I'm sure I'm missing something.The text was updated successfully, but these errors were encountered: