Test Stream::parseFloat() with many input digits #133

edgar-bonet · 2020-12-10T20:28:53Z

As a followup to issue #129: “Stream::parseFloat() fails if it reads too many digits”, this pull request adds a failing test to test_parseFloat.cpp that reveals the problem. I ran into a couple of issues when writing the test:

Ensuring correct rounding when parsing a decimal representation of a number is not trivial, and Stream::parseFloat() does not attempt to provide such guarantee. Testing for exact equality to the expected result is thus prone to spurious failures.

The existing tests in test_parseFloat.cpp do test for exact equality, and it seems to work. However, the more digits are processed, the greater the chances that rounding errors pile up and the end result differs from the correctly rounded one. Since this test requires processing many decimal digits, I had to use fuzzy floating point comparison in order to avoid the test failing for the wrong reason.
It would seem these tests are meant to be compiled and run on 64-bit Linux hosts, where a long is 64-bits wide. Most (all?) Arduino platforms, in contrast, have 32-bit longs. Since this test is meant to reveal the effect of a long variable overflowing, I had to add many more digits than would be needed to trigger the overflow on an Arduino. As shown in issue Stream::parseFloat() fails if it reads too many digits #129, 10 digits of π are enough to make Stream::parseFloat() fail badly on an actual Arduino.

matthijskooijman · 2020-12-10T21:34:55Z

is probably yet another argument to switch official APIs to use e.g. int32_t instead of long, see also Documentation should use more specific types Arduino#4525 for related discussion.

edgar-bonet · 2020-12-10T22:24:05Z

@matthijskooijman: The overflow affects a local variable, not exposed to the API.

aentinger · 2020-12-11T05:33:42Z

is probably yet another argument to switch official APIs to use e.g. int32_t instead of long, see also arduino/Arduino#4525 for related discussion.

While I technically agree with @matthijskooijman I've just yesterday participated in a discussion with @tigoe in which he expressed his dismay on some prototype code using uint8_t (instead of byte) and general usage of uintXX_t types (His opinion voiced in 2016 seems to be constant over time 😉 ). Let me also add that this my last word on this topic within this PR, if you feel it's something we should discuss in more depth please open a new issue within this repository.

@edgar-bonet Thank you very much for providing this first step on fixing your issue (A failing test is a great was to start!). Considering floating point comparison you can use the Catch2 provided Approx method, e.g.

REQUIRE(mock.parseFloat() == Approx(- 3.141592654f);

Here you can find more documentation how to perform such floating point comparisons.

edgar-bonet · 2020-12-11T09:30:07Z

@aentinger: Thanks for the info! I did not know about this library. The Approx() macro definitely makes both the test code and the error message more readable. I just pushed a commit to use this macro.

Now, I have a question for you.

If the stream provides an integer larger than LONG_MAX but smaller than FLT_MAX, the variable value also overflows. Whereas this is not exactly the same issue as reading too many decimal digits after the radix point, it is strongly related, and its fix involves the same lines of code as the current issue. My question is: should this “parse large integer” problem be considered another aspect of this issue, or should we handle it as a separate issue?

In the first case, I would add the relevant test to this PR, and then submit a PR than handles both situations. In the second case, I would wait for this issue to be sorted out before moving to the “parse large integer” problem.

aentinger · 2020-12-11T09:54:42Z

Thank you integrating Approx, this indeed makes the test code so much easier. I'd say the parseInt should be handled separately, could you please also create an issue and the smallest failing test?

matthijskooijman · 2020-12-11T11:31:08Z

@matthijskooijman: The overflow affects a local variable, not exposed to the API.

Right, that would make it even easier to change from long to int32_t. IMHO that would be good to change, since it makes the code behave more consistent across platforms. And since it's already 32-bits, I don't think any of the "use int because that's usually fastest on the current platform" arguments apply here. But this is a little off-topic for this PR, maybe.

edgar-bonet · 2020-12-11T11:43:56Z

@aentinger wrote:

I'd say the parseInt should be handled separately

Sorry, I was not clear enough. It is not about parseInt. The issue is parseFloat overflowing a local variable when it reads too many digits. And it can be split into two sub-issues:

Too many digits after the decimal point → can be fixed by ignoring the extra digits.
Too many digits before the decimal point (e.g. a large integer) → the extra digits cannot simply be ignored.

My question is whether these two points should be considered parts of the same issue or separate issues.

@matthijskooijman: I do think it is off-topic.

aentinger · 2020-12-14T08:10:32Z

Good morning @edgar-bonet 👋 ☕
Thank you very much for clarifying this point 👍 Since those issues also concern issues within parseFloat in combination with a large number of digits I think it's okay to handle them within this PR. Could you please add additional failing tests for both too many digits after and before the decimal point?

edgar-bonet · 2020-12-14T09:10:35Z

Hi @aentinger, thanks for your answer.

I just added a failing test case with a number that has too many digits before the decimal point. This is in addition to the previous one that has too many digits after the decimal point.

Should I squash the three commits together and force-push?

…low when parsing float values. However, we still need to ensure against too large values contained in streams. This should be possible because the maximum length of a float value pre-comma is known to be 38 digits (FLT_MAX_10_EXP).

aentinger · 2020-12-14T09:15:24Z

Should I squash the three commits together and force-push?

Please no. I already added a first possible solution on how to address this issue. Please to a pull first before your next commit or you've got disentangle it with the commit I pushed 😉

codecov-io · 2020-12-14T09:15:37Z

Codecov Report

Merging #133 (6197511) into master (78f3f41) will decrease coverage by 0.34%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
- Coverage   96.41%   96.06%   -0.35%     
==========================================
  Files          14       14              
  Lines         837      839       +2     
==========================================
- Hits          807      806       -1     
- Misses         30       33       +3

Impacted Files	Coverage Δ
api/Stream.cpp	`91.09% <100.00%> (-0.07%)`	⬇️
api/String.cpp	`97.69% <0.00%> (-0.77%)`	⬇️
api/String.h	`90.90% <0.00%> (+2.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78f3f41...6197511. Read the comment docs.

edgar-bonet · 2020-12-14T09:20:30Z

api/Stream.cpp

  int c;
-  float fraction = 1.0;
+  unsigned int digits_post_comma = 0;


If value is a floating point number, there is no need for this extra variable. It could be folded into value.

That's a good question here. What's more computing time expensive - pow or repeatedly doing a double multiplication?

However, a further argument against pow is that there's a question of availability (and with what argument types) across all supported platforms.

edgar-bonet · 2020-12-14T09:21:10Z

api/Stream.cpp

-    return value * fraction;
-  else
-    return value;
+    value /= pow(10, digits_post_comma);


pow() involves a logarithm and an exponential, which is very expensive on MMU-less processors.

edgar-bonet · 2020-12-14T09:23:14Z

I had considered parsing straight into a float, but I then assumed the original code used an integer for performance reasons. My tests show parsing into a float takes about 50% more processing time an AVR.

… multiplication.

aentinger · 2020-12-14T09:28:58Z

My tests show parsing into a float takes about 50% more processing time an AVR.

Right now ArduinoCore-API is not used on ArduinoCore-avr. There are plans for doing so but the timeline is more than a bit fuzzy. Also, as far as I can see all future Arduino platforms will be based on 32-Bit ARM MCUs or comuting-equivalent MCUs.

EDIT: ArduinoCore-API is used with ArduinoCore-megaavr so there is a 8-Bit AVR core which will be affected by this change.

edgar-bonet · 2020-12-14T09:32:10Z

api/Stream.cpp

@@ -182,7 +181,7 @@ float Stream::parseFloat(LookaheadMode lookahead, char ignore)
    else if(c >= '0' && c <= '9')  {      // is c a digit?
      value = value * 10 + c - '0';


value may overflow to INFINITY on AVR if there are more than 38 digits (maybe that's so many we do not care about that case?), event with a number smaller than FLT_MAX. I suggest instead:

if(isFraction) { fraction *= 0.1; value = value + fraction * (c - '0'); } else { value = value * 10 + c - '0'; }

Note that on something other than AVR one would need more than 308 digits to demonstrate the problem.

I like this form, however in that case we've got to commit to using a double for value. I personally would not mind since we are talking about stream parsing after all and the only Stream channels currently available are Serial and various networking streams and which are slow even compared to floating point multiplication.

I can further support my argument pro floating point implementation that if you want to run performance-critical/floating-point applications on an 8-Bit architecture you've selected the wrong MCU alltogether.

@facchinm / @cmaglie what's your take on this?

Since I got no feedback over quite a long period of time (unfortunate but not unexpected given what we are currently swamped in) I'm moving forward with merging this PR.

Let's hold on for a second though @edgar-bonet can you integrate your last change suggested above by yourself:

if(isFraction) { fraction *= 0.1; value = value + fraction * (c - '0'); } else { value = value * 10 + c - '0'; }

@aentinger: OK, I'll do it today.

Give a 311-digit number to Stream::parseFloat(). This makes the local variable `value' overflow to infinity. With so many digits, the number cannot be parsed into an integer, not even into an integer stored as a `double'. Note that 40 digits would be enough to unveil this issue on AVR.

If more than 309 digits are provided to Stream::parseFloat() (more than 39 on AVR), the internal variable 'value' would overflow to infinity. We avoid this by not storing the parsed number as an integer-in-a-float.

edgar-bonet · 2021-01-25T12:55:28Z

As requested, I just pushed my proposed change. Prior to this, I took the liberty to modify the test “A float is provided with too many digits after the decimal point” by adding more digits, in order to evidence the issue the last commit is fixing.

aentinger

LGTM 👍 Thank you @edgar-bonet 🚀

Test Stream::parseFloat() with many digits

129ae52

edgar-bonet mentioned this pull request Dec 10, 2020

Stream::parseFloat() fails if it reads too many digits #129

Closed

Use Approx() macro for fuzzy comparison

8eb4013

Test Stream::parseFloat() with a large number

5445db3

edgar-bonet commented Dec 14, 2020

View reviewed changes

Replacing computational expensive pow call with result of accumulated…

1266b08

… multiplication.

edgar-bonet commented Dec 14, 2020

View reviewed changes

edgar-bonet added 2 commits January 25, 2021 13:48

Avoid overflowing parseFloat()'s internal 'value'

6197511

If more than 309 digits are provided to Stream::parseFloat() (more than 39 on AVR), the internal variable 'value' would overflow to infinity. We avoid this by not storing the parsed number as an integer-in-a-float.

aentinger approved these changes Jan 25, 2021

View reviewed changes

aentinger merged commit 2af4a9c into arduino:master Jan 25, 2021

		@@ -182,7 +181,7 @@ float Stream::parseFloat(LookaheadMode lookahead, char ignore)
		else if(c >= '0' && c <= '9') { // is c a digit?
		value = value * 10 + c - '0';

Uh oh!

Test Stream::parseFloat() with many input digits #133

Test Stream::parseFloat() with many input digits #133

Uh oh!

Conversation

edgar-bonet commented Dec 10, 2020

Uh oh!

matthijskooijman commented Dec 10, 2020

Uh oh!

edgar-bonet commented Dec 10, 2020

Uh oh!

aentinger commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edgar-bonet commented Dec 11, 2020

Uh oh!

aentinger commented Dec 11, 2020

Uh oh!

matthijskooijman commented Dec 11, 2020

Uh oh!

edgar-bonet commented Dec 11, 2020

Uh oh!

aentinger commented Dec 14, 2020

Uh oh!

edgar-bonet commented Dec 14, 2020

Uh oh!

aentinger commented Dec 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Dec 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edgar-bonet commented Dec 14, 2020

Uh oh!

aentinger commented Dec 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edgar-bonet commented Jan 25, 2021

Uh oh!

aentinger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aentinger commented Dec 11, 2020 •

edited

Loading

aentinger commented Dec 14, 2020 •

edited

Loading

codecov-io commented Dec 14, 2020 •

edited

Loading

aentinger commented Dec 14, 2020 •

edited

Loading