Skip to content

Remove duplicated sha1 implementation (Fixes #6568) #6569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 1, 2019

Conversation

dirkmueller
Copy link
Contributor

The Hash library had its own copy of a loop-unrolled sha1 implementation
adding a large code footprint for no good reason, as there are several
sha1 implementations already in tree (one in NONOS-SDK as well as one
in bearssl). Switching to the bearssl one is straightforward and removes
about 3kb of code size overhead.

Also cleanup some obvious inefficiencies (copy by value, string
summing, no reservation causing repeated reallocations) in the
implementation.

The Hash library had its own copy of a loop-unrolled sha1 implementation
adding a large code footprint for no good reason, as there are several
sha1 implementations already in tree (one in NONOS-SDK as well as one
in bearssl). Switching to the bearssl one is straightforward and removes
about 3kb of code size overhead.

Also cleanup some obvious inefficiencies (copy by value, string
summing, no reservation causing repeated reallocations) in the
implementation.
Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick and useful PR. Only some minor technical debt things noted below, please.

@earlephilhower earlephilhower added this to the 2.6.0 milestone Sep 30, 2019
The data is always remaining unmodified, so non-const overloads
are confusing and redundant. Also optimize the hexify variant
a bit more.
@earlephilhower
Copy link
Collaborator

Argh, looks like Arduino.cc updated their nightly builder and it's now SEGV'ing. DevOps and CI are great fun and a source of perpetual employment. :(

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx! We'll have to wait until we can figure out the arduino-builder crash before merge, but that's not your problem (unless you want to look at it!)

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the runtimes for both code. versions? Is the current one faster at all? Is the proposed one much slower? This is something that should also be considered. Please provide some test numbers.

@dirkmueller
Copy link
Contributor Author

dirkmueller commented Sep 30, 2019

@devyte the proposed change is definitely not making it faster, as it removes loop unrolling which is known to improve performance. However, there are two considerations to make: any sketch using sha1 in 2019 for business critical (performance) logic is probably at their own fault, and really shouldn't do that anymore given that sha1 is no longer "secure" and b) this is not adding a new implementation, it just uses the one from bearssl. Any bugfix/improvements made to sha1 performance in bearssl (where it actually if anywhere at all still matters) will automatically translate to an improvement here. The main purpose of this change is not to improve performance but to remove repetition.

That being said, you probably want numbers anyway, so I made some strawman benchmark:

using sha(const char*, size, shasum) (80 MHz):

micros for abc: 632 (+56.1%)
micros for 2048 a's: 1096 (+32.5%)

using sha1(String) and 80 MHz:

micros for "abc": 604 (+50%)
micros for 2048 a's: 1623 (+32.1%)

using sha1(const char*, size) (80 Mhz):

micros for abc: 619 (+49.8%)
micros for 2048 a's: 1626 (+42.5%)

using sha1(const char*, size) (160mhz):

micros for abc: 344 (+42.7%)
micros for 2048 a's: 835 (+39.8%)

conclusions:

  • This is entirely cpu bound, increasing freq to 160MHz very proportionally translates into a speedup.
  • The optimisations in the hexifed-string-creation part and the pass-by-reference of the sha1() function variants offset the cost of the new function somewhat, by reducing the slowdown by roughly 10%
  • The actual SHA1 calculation slowdown is between 56.1% (for single block/very short input) and 32.5% for long input. I don't know what the typical input data length is though to say what the average expected slowdown is, but anyway it is still the same magnitude of runtime.

So the numbers are surprising, because it is not clear why single-block(short) input should be 56% slower while longer input (where loop unrolling etc would be helpful) is only 32.5%. it is entirely possible that there is some performance tuning to make in bearssl's implementation for very short input, in any way this change actually makes that tuning effort worthwhile because we no longer duplicate the implementation twice.

So on the other side we're saving a few kilobytes (!) of flash. which for all projects that I was involved in was worth a lot more than speed.

@earlephilhower
Copy link
Collaborator

I would say that large block runs suffer from cache issues with the unrolled original code, if not directly in the sha1 implementation than in other routines that are going on in parallel. So IPC will be lower than a smaller routine.

The much smaller block puts less pressure on the cache, so while it may be take X% more instructions executed, the average IPC has gone up due to lower cache misses.

In any case, neither runtime seems to be that significant given the code savings.

I should probably pull the Updater SHA256 into this class at some time, from BearSSLHelpers. I honestly didn't even know this "library" (of only a single hash routine?!) even existed.

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose of this change is not to improve performance but to remove repetition.

Like you just found out, the removal of that repetition causes a significant slowdown of up to 1.5X in the hash function, and that could impact users no matter what they're using it for. Therefore, the slowdown needed to be investigated and documented here. Now, if anyone is in fact impacted, they'll (eventually) find this PR, and read that we're making this conscious decision to take this performance hit in favor of bin size reduction and also to reduce functional duplication.

Approved.

@earlephilhower earlephilhower merged commit 1d26b28 into esp8266:master Oct 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants