Skip to content

Test Stream::parseFloat() with many input digits #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 25, 2021
Merged

Test Stream::parseFloat() with many input digits #133

merged 7 commits into from
Jan 25, 2021

Conversation

edgar-bonet
Copy link
Contributor

As a followup to issue #129: “Stream::parseFloat() fails if it reads too many digits”, this pull request adds a failing test to test_parseFloat.cpp that reveals the problem. I ran into a couple of issues when writing the test:

  1. Ensuring correct rounding when parsing a decimal representation of a number is not trivial, and Stream::parseFloat() does not attempt to provide such guarantee. Testing for exact equality to the expected result is thus prone to spurious failures.

    The existing tests in test_parseFloat.cpp do test for exact equality, and it seems to work. However, the more digits are processed, the greater the chances that rounding errors pile up and the end result differs from the correctly rounded one. Since this test requires processing many decimal digits, I had to use fuzzy floating point comparison in order to avoid the test failing for the wrong reason.

  2. It would seem these tests are meant to be compiled and run on 64-bit Linux hosts, where a long is 64-bits wide. Most (all?) Arduino platforms, in contrast, have 32-bit longs. Since this test is meant to reveal the effect of a long variable overflowing, I had to add many more digits than would be needed to trigger the overflow on an Arduino. As shown in issue Stream::parseFloat() fails if it reads too many digits #129, 10 digits of π are enough to make Stream::parseFloat() fail badly on an actual Arduino.

@matthijskooijman
Copy link
Collaborator

  1. is probably yet another argument to switch official APIs to use e.g. int32_t instead of long, see also Documentation should use more specific types Arduino#4525 for related discussion.

@edgar-bonet
Copy link
Contributor Author

@matthijskooijman: The overflow affects a local variable, not exposed to the API.

@aentinger
Copy link
Contributor

aentinger commented Dec 11, 2020

  1. is probably yet another argument to switch official APIs to use e.g. int32_t instead of long, see also arduino/Arduino#4525 for related discussion.

While I technically agree with @matthijskooijman I've just yesterday participated in a discussion with @tigoe in which he expressed his dismay on some prototype code using uint8_t (instead of byte) and general usage of uintXX_t types (His opinion voiced in 2016 seems to be constant over time 😉 ). Let me also add that this my last word on this topic within this PR, if you feel it's something we should discuss in more depth please open a new issue within this repository.

@edgar-bonet Thank you very much for providing this first step on fixing your issue (A failing test is a great was to start!). Considering floating point comparison you can use the Catch2 provided Approx method, e.g.

REQUIRE(mock.parseFloat() == Approx(- 3.141592654f);

Here you can find more documentation how to perform such floating point comparisons.

@edgar-bonet
Copy link
Contributor Author

@aentinger: Thanks for the info! I did not know about this library. The Approx() macro definitely makes both the test code and the error message more readable. I just pushed a commit to use this macro.

Now, I have a question for you.

If the stream provides an integer larger than LONG_MAX but smaller than FLT_MAX, the variable value also overflows. Whereas this is not exactly the same issue as reading too many decimal digits after the radix point, it is strongly related, and its fix involves the same lines of code as the current issue. My question is: should this “parse large integer” problem be considered another aspect of this issue, or should we handle it as a separate issue?

In the first case, I would add the relevant test to this PR, and then submit a PR than handles both situations. In the second case, I would wait for this issue to be sorted out before moving to the “parse large integer” problem.

@aentinger
Copy link
Contributor

Thank you integrating Approx, this indeed makes the test code so much easier. I'd say the parseInt should be handled separately, could you please also create an issue and the smallest failing test?

@matthijskooijman
Copy link
Collaborator

@matthijskooijman: The overflow affects a local variable, not exposed to the API.

Right, that would make it even easier to change from long to int32_t. IMHO that would be good to change, since it makes the code behave more consistent across platforms. And since it's already 32-bits, I don't think any of the "use int because that's usually fastest on the current platform" arguments apply here. But this is a little off-topic for this PR, maybe.

@edgar-bonet
Copy link
Contributor Author

@aentinger wrote:

I'd say the parseInt should be handled separately

Sorry, I was not clear enough. It is not about parseInt. The issue is parseFloat overflowing a local variable when it reads too many digits. And it can be split into two sub-issues:

  1. Too many digits after the decimal point → can be fixed by ignoring the extra digits.

  2. Too many digits before the decimal point (e.g. a large integer) → the extra digits cannot simply be ignored.

My question is whether these two points should be considered parts of the same issue or separate issues.

@matthijskooijman: I do think it is off-topic.

@aentinger
Copy link
Contributor

Good morning @edgar-bonet 👋 ☕
Thank you very much for clarifying this point 👍 Since those issues also concern issues within parseFloat in combination with a large number of digits I think it's okay to handle them within this PR. Could you please add additional failing tests for both too many digits after and before the decimal point?

@edgar-bonet
Copy link
Contributor Author

Hi @aentinger, thanks for your answer.

I just added a failing test case with a number that has too many digits before the decimal point. This is in addition to the previous one that has too many digits after the decimal point.

Should I squash the three commits together and force-push?

…low when parsing float values.

However, we still need to ensure against too large values contained in streams. This should be possible because the maximum length of a float value pre-comma is known to be 38 digits (FLT_MAX_10_EXP).
@aentinger
Copy link
Contributor

aentinger commented Dec 14, 2020

Should I squash the three commits together and force-push?

Please no. I already added a first possible solution on how to address this issue. Please to a pull first before your next commit or you've got disentangle it with the commit I pushed 😉

@codecov-io
Copy link

codecov-io commented Dec 14, 2020

Codecov Report

Merging #133 (6197511) into master (78f3f41) will decrease coverage by 0.34%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
- Coverage   96.41%   96.06%   -0.35%     
==========================================
  Files          14       14              
  Lines         837      839       +2     
==========================================
- Hits          807      806       -1     
- Misses         30       33       +3     
Impacted Files Coverage Δ
api/Stream.cpp 91.09% <100.00%> (-0.07%) ⬇️
api/String.cpp 97.69% <0.00%> (-0.77%) ⬇️
api/String.h 90.90% <0.00%> (+2.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78f3f41...6197511. Read the comment docs.

api/Stream.cpp Outdated
int c;
float fraction = 1.0;
unsigned int digits_post_comma = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If value is a floating point number, there is no need for this extra variable. It could be folded into value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question here. What's more computing time expensive - pow or repeatedly doing a double multiplication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, a further argument against pow is that there's a question of availability (and with what argument types) across all supported platforms.

api/Stream.cpp Outdated
return value * fraction;
else
return value;
value /= pow(10, digits_post_comma);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pow() involves a logarithm and an exponential, which is very expensive on MMU-less processors.

@edgar-bonet
Copy link
Contributor Author

I had considered parsing straight into a float, but I then assumed the original code used an integer for performance reasons. My tests show parsing into a float takes about 50% more processing time an AVR.

@aentinger
Copy link
Contributor

aentinger commented Dec 14, 2020

My tests show parsing into a float takes about 50% more processing time an AVR.

Right now ArduinoCore-API is not used on ArduinoCore-avr. There are plans for doing so but the timeline is more than a bit fuzzy. Also, as far as I can see all future Arduino platforms will be based on 32-Bit ARM MCUs or comuting-equivalent MCUs.

EDIT: ArduinoCore-API is used with ArduinoCore-megaavr so there is a 8-Bit AVR core which will be affected by this change.

api/Stream.cpp Outdated
@@ -182,7 +181,7 @@ float Stream::parseFloat(LookaheadMode lookahead, char ignore)
else if(c >= '0' && c <= '9') { // is c a digit?
value = value * 10 + c - '0';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value may overflow to INFINITY on AVR if there are more than 38 digits (maybe that's so many we do not care about that case?), event with a number smaller than FLT_MAX. I suggest instead:

  if(isFraction) {
    fraction *= 0.1;
    value = value + fraction * (c - '0');
  } else {
    value = value * 10 + c - '0';
  }

Note that on something other than AVR one would need more than 308 digits to demonstrate the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this form, however in that case we've got to commit to using a double for value. I personally would not mind since we are talking about stream parsing after all and the only Stream channels currently available are Serial and various networking streams and which are slow even compared to floating point multiplication.

I can further support my argument pro floating point implementation that if you want to run performance-critical/floating-point applications on an 8-Bit architecture you've selected the wrong MCU alltogether.

@facchinm / @cmaglie what's your take on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I got no feedback over quite a long period of time (unfortunate but not unexpected given what we are currently swamped in) I'm moving forward with merging this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hold on for a second though @edgar-bonet can you integrate your last change suggested above by yourself:

if(isFraction) {
    fraction *= 0.1;
    value = value + fraction * (c - '0');
  } else {
    value = value * 10 + c - '0';
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aentinger: OK, I'll do it today.

Give a 311-digit number to Stream::parseFloat(). This makes the local
variable `value' overflow to infinity. With so many digits, the number
cannot be parsed into an integer, not even into an integer stored as a
`double'.

Note that 40 digits would be enough to unveil this issue on AVR.
If more than 309 digits are provided to Stream::parseFloat() (more than
39 on AVR), the internal variable 'value' would overflow to infinity. We
avoid this by not storing the parsed number as an integer-in-a-float.
@edgar-bonet
Copy link
Contributor Author

As requested, I just pushed my proposed change. Prior to this, I took the liberty to modify the test “A float is provided with too many digits after the decimal point” by adding more digits, in order to evidence the issue the last commit is fixing.

Copy link
Contributor

@aentinger aentinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thank you @edgar-bonet 🚀

@aentinger aentinger merged commit 2af4a9c into arduino:master Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants