Skip to content

SELECT FLOAT to float64 #308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

arvenil
Copy link
Contributor

@arvenil arvenil commented Feb 1, 2015

Hi all,

While I was trying out #297 I discovered some of tests in internal project started to fail when emulated queries were used instead of prepared statements.
I checked query log with substitutePlaceholder=true and substitutePlaceholder=false. All queries were identical, didn't matter if they were prepared by MySQL or by driver (tests run 2000+ queries) - so #297 was 100% correct!

So I investigated further... Tests were failing at some floats where expected value was e.g. 0.050551000982522964 but with substitutePlaceholder=true I got 0.050551. I checked data, and actually correct value was 0.050551. So actually turning substitutePlaceholder=true on provided correct data.

I found out that in some cases switching from MySQL prepared statements to driver prepared statements changed the way how data was parsed by driver. With substitutePlaceholder=false data was parsed by func (rows *binaryRows) readRow (https://github.com/go-sql-driver/mysql/blob/master/packets.go#L962) while switching to substitutePlaceholder=true changed the behavior and data got parsed by func (rows *textRows) readRow (https://github.com/go-sql-driver/mysql/blob/master/packets.go#L595). After some investigation I came to conclusion that currently selecting FLOAT to float64 is incorrectly handled. I've attached failing test cases with fixes. With one fix I'm not happy - it's an ugly hack because I couldn't figure out what's going on here: https://github.com/go-sql-driver/mysql/blob/master/packets.go#L619 I couldn't figure out when rows.columns is filled with data... I wanted simply to check for rows.columns[i].fieldType but it looks like it doesn't always have data and so it was throwing panics... I've added dirty hack if i > len(rows.columns)-1 { continue } and it works but I bet there is a better way to handle that ;)

Summary:
"CREATE TABLE test (f FLOAT)"
"INSERT INTO test VALUE (0.050551)"

  • "SELECT f FROM test", always worked correctly (actually this still confuses me, why this worked?)
  • "SELECT f FROM test WHERE 1=?", required fix in (rows *binaryRows) readRow
  • "SELECT NULL(f, 0) f FROM test", required fix in (rows *textRows) readRow (why this is different than first example?)

@arvenil
Copy link
Contributor Author

arvenil commented Feb 1, 2015

You can check build result for each individual commit here: https://github.com/arvenil/mysql/compare/select_float_to_float64

}
}

dsns := []string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseTime is irrelevant here I guess. You don't need to test both versions of the DSN. Just leave out parseTime altogether.

@julienschmidt
Copy link
Member

One more problem: What you are trying to do (returning float32 instead) is illegal according to the specification: http://golang.org/pkg/database/sql/driver/#Value and http://golang.org/pkg/database/sql/driver/#Rows

@julienschmidt
Copy link
Member

Is there really something wrong here at all? http://play.golang.org/p/kY-amQoUG1

@arnehormann
Copy link
Member

No, there's nothing wrong.
float32 has 23 bits of data for digits (rest is for sign and exponent), that's about 6 digits in decimal. float64 has 52 bits, that's about 15 digits in decimal.
Try strconv.FormatFloat(f32_64, 'f', -1, 32) and change the 32 at the end to 64. Same result as with your playground link.
Formatting as float32 leads to a cutoff at the end, the longer one is just more accurate.
But it's still not fully accurate, it's an approximation attempting to use the closest human readable representation.
Even 0.1 can not be represented by a sum of 1/2^i for any maximum value of i. 5 just isn't divisible by 2 without a remainder. Approximating is necessary because humans dislike reading infinitely long numbers.

@arnehormann
Copy link
Member

@arvenil your test is faulty, the driver does what it should do.
See http://play.golang.org/p/WMJz63yRJn
Floats are something strange to wrap the head around.

EDIT
Above, I originally wrote 27 bits and 7 decimal digits for float32. That's wrong, it's 23 and it should be 6 in decimal (but Wikipedia seems to disagree there). Also, it's 15 decimal digits for float64, not 16.

To illustrate:
1<<23 - 1 == 8388608 (6 decimal digits after first)
1<<53 - 1 == 4503599627370496 (15 decimal digits after first)
All digits but the first allow for the full decimal range of 0...9, that's where the approximation of accurately representable decimal digits come from.

@arvenil
Copy link
Contributor Author

arvenil commented Feb 10, 2015

Hi, sorry for not responding earlier but I was quite busy (and was thinking about this too).

Yes, I got the float32 to float64 conversion and changes related to the precision.
The original bug is not about this (ignore my fix proposal). I think problem is more complex than float32 to float64 conversion. I'm fine if it will return 0.050551 or 0.050551000982522964 due to float32 to float64 conversion as long as one or the other is always returned - not sometimes one and sometimes the other.

The original bug report from me is that this query:

SELECT f FROM test

Results with float64(0.050551)

While those queries

"SELECT f FROM test WHERE 1=?", 1))
"SELECT IFNULL(f, 0) f FROM test"))

Results with float64(0.050551000982522964)

So, it's not about which one is correct, but that they return different values.

@julienschmidt @arnehormann Is this correct behavior? If so then why? If so then maybe it would be a good idea to write tests for this? I might find the answer but it would be great if you guys could take a look at this too and tell me what you think.

@arnehormann
Copy link
Member

A preliminary (strong) suspicion:
When the text protocol is used (as it is with substituePlaceholder enabled), MySQL transmits a string result - so we receive the value in string form and convert it to float64 with float64 accuracy.
When a float is sent in binary protocol, the value is transmitted in binary, converted to float64 and then converted to string.
In the binary case, we don't know the last 32 bits and round down, we assume they are 0.
In the former case, we know the closest matching decimal representation and work at the required accuracy level from there (taking an approximate value somewhere - anywhere - in the valid bounds for a certain floating point number).

None of these are wrong, just righter and wronger in certain cases.
Starting from text values, using the text value is probably intended.
But it's just an artifact caused by the data flow and the fact that Go only allows for float64 in drivers.
Working around this will slow the driver down, as we have to call something like this for each FLOAT value to estimate the closest decimal:

f32str := strconv.FormatFloat(float64(floatFromServer), 'g', -1, 32)
result, _ := strconv.ParseFloat(f32str, 64)

Floats are hard. Floating point decimals would improve this so much 😢

I'm still for "working as intended" - or at least "the lesser evil"
It's comparable to the uint64 case with the high bit set - no nice solution.

@arvenil
Copy link
Contributor Author

arvenil commented Feb 10, 2015

Exactly! I came to the same conclusion. In text protocol we probably get closest value e.g. 0.050551 and this is converted directly to float64, so the precision is higher. While in binary we get 0.050551000982522964 and this is converted as is to float64 which is lower precision than text 0.050551. (weird thing is that one time text protocol is used and another time it's not)

I'm ok with working as intended however in this case I would love to see tests that lock downs this behavior with proper comment - something I could point people to when they hit this problem and ask wth.

Also a good thing could be to check how this is handled by other drivers in other languages.

Thanks for you time guys, and thanks for looking at this!

@arnehormann
Copy link
Member

@arvenil I just want to state that this is exactly what I meant when I said your test is wrong.

The precision is not "higher", the resulting value just probably matches the user intent better.
Only probably because we could also be dealing with binary inputs from a sensor where the "real" measured value could be closer to 0.050551000982522964 than to 0.050551.

There's just an empty space between what we receive and the format we have to pass it on in (float64) - and we fill it differently in the text and binary case because we never know what the intended value is. Most of the time, it will be the closest matching decimal. But it does not have to be and using that value would be slower.
We hate slower.
Prio 1 is correctness, prio 2 and 3 are speed and low memory usage. Features (like friendlier approximations) come a distant fourth.

I don't know how to sensibly lock this down in a test and I'd prefer not to to keep it open when some way of improved handling presents itself upstream. For speed and for correctness. But I can see the value in documenting this.
I think a Wiki page or a part in the README would be better, but I can be convinced of the utility of a test.
Can you write the kind of test you'd like, make it another pull request and discuss it with us? Maybe you best start with describing it in an abstract way here.

Btw, thanks for your engagement! It's pretty great.

@arvenil
Copy link
Contributor Author

arvenil commented Feb 10, 2015

Yes, my bad, wrongly phrased sentence. With the precision I meant that in case of text protocol, with conversion of string 0.050551 to float64 you get 64 bit value of 0.050551. You get higher precision (more bits) to 0.050551 but that's in fact incorrect as what is stored in MySQL is 32 bit float - not 0.050551. The 0.050551 delivered in text protocol is approximate done by MySQL.
https://play.golang.org/p/GR8CgYWFUA

Ok, enough, thanks for help figuring this out!

@arnehormann
Copy link
Member

You're welcome!
Great playground illustration!

arvenil added a commit to percona/cloud-protocol that referenced this pull request Feb 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants