Floats can lose precision when loading to BigQuery #326

danielchatfield · 2020-09-01T11:14:51Z

The float precision is set here: https://github.com/pydata/pandas-gbq/blob/d251db03b159447331ac9ae63e13d295d75bad70/pandas_gbq/load.py#L22

This is insufficient to represent all 64 bit floats without losing precision. For example 26/59 should be represented as 0.4406779661016949 but under this it is represented as 0.440677966101695.

This was added intentionally here to fix a different issue but it causes us some issues as we need perfect reconciliation between systems. It seems like it should be possible to get the best of both worlds and output the correct number of digits in all cases.

The original suggestion was to use %g but this was changed to %.15g – it's not clear to me what the rationale is for that, it seems like %g is strictly better but I'm sure I'm missing something.

The text was updated successfully, but these errors were encountered:

danielchatfield · 2020-09-07T15:43:21Z

@max-sixty I was wondering if you might be able to provide some help / guidance here?

dkapitan · 2020-09-07T16:34:40Z

@danielchatfield

Not sure if this is helpful, but I think one of the issues as explained here is that a conservative choice is made in the number of significant digits.

A possible solution if you do need to have larger precision is to use .parquet format instead, as suggested here?

max-sixty · 2020-09-07T17:36:38Z

Thanks @dkapitan

tswast · 2020-10-02T15:38:59Z

Would %.16g avoid the precision issues?

tswast · 2020-11-06T19:14:55Z

The original suggestion was to use %g but this was changed to %.15g – it's not clear to me what the rationale is for that, it seems like %g is strictly better but I'm sure I'm missing something.

According to https://docs.python.org/3.6/library/string.html#formatspec

A precision of 0 is treated as equivalent to a precision of 1. The default precision is 6.

I think that would result in too much rounding on some systems.

tswast · 2020-11-06T20:15:32Z

According to https://en.wikipedia.org/wiki/IEEE_754#Character_representation 17 digits are precision are required to preserve the original binary value. 16 digits was not enough in my testing of #336

tswast added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Nov 6, 2020

tswast mentioned this issue Nov 6, 2020

BUG: use greater precision when serializing floating points #336

Merged

4 tasks

tswast self-assigned this Nov 6, 2020

tswast closed this as completed in #336 Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Floats can lose precision when loading to BigQuery #326

Floats can lose precision when loading to BigQuery #326

danielchatfield commented Sep 1, 2020 •

edited

Loading

danielchatfield commented Sep 7, 2020

Uh oh!

dkapitan commented Sep 7, 2020 •

edited

Loading

Uh oh!

max-sixty commented Sep 7, 2020

Uh oh!

tswast commented Oct 2, 2020

Uh oh!

tswast commented Nov 6, 2020

Uh oh!

tswast commented Nov 6, 2020

Uh oh!

Floats can lose precision when loading to BigQuery #326

Floats can lose precision when loading to BigQuery #326

Comments

danielchatfield commented Sep 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

danielchatfield commented Sep 7, 2020

Uh oh!

dkapitan commented Sep 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-sixty commented Sep 7, 2020

Uh oh!

tswast commented Oct 2, 2020

Uh oh!

tswast commented Nov 6, 2020

Uh oh!

tswast commented Nov 6, 2020

Uh oh!

danielchatfield commented Sep 1, 2020 •

edited

Loading

dkapitan commented Sep 7, 2020 •

edited

Loading