Remove duplicated sha1 implementation (Fixes #6568) #6569

dirkmueller · 2019-09-30T07:14:52Z

The Hash library had its own copy of a loop-unrolled sha1 implementation
adding a large code footprint for no good reason, as there are several
sha1 implementations already in tree (one in NONOS-SDK as well as one
in bearssl). Switching to the bearssl one is straightforward and removes
about 3kb of code size overhead.

Also cleanup some obvious inefficiencies (copy by value, string
summing, no reservation causing repeated reallocations) in the
implementation.

The Hash library had its own copy of a loop-unrolled sha1 implementation adding a large code footprint for no good reason, as there are several sha1 implementations already in tree (one in NONOS-SDK as well as one in bearssl). Switching to the bearssl one is straightforward and removes about 3kb of code size overhead. Also cleanup some obvious inefficiencies (copy by value, string summing, no reservation causing repeated reallocations) in the implementation.

earlephilhower

Thanks for the quick and useful PR. Only some minor technical debt things noted below, please.

libraries/Hash/src/Hash.cpp

libraries/Hash/src/Hash.h

The data is always remaining unmodified, so non-const overloads are confusing and redundant. Also optimize the hexify variant a bit more.

earlephilhower · 2019-09-30T14:26:55Z

Argh, looks like Arduino.cc updated their nightly builder and it's now SEGV'ing. DevOps and CI are great fun and a source of perpetual employment. :(

earlephilhower

Thx! We'll have to wait until we can figure out the arduino-builder crash before merge, but that's not your problem (unless you want to look at it!)

devyte

What are the runtimes for both code. versions? Is the current one faster at all? Is the proposed one much slower? This is something that should also be considered. Please provide some test numbers.

dirkmueller · 2019-09-30T19:53:44Z

@devyte the proposed change is definitely not making it faster, as it removes loop unrolling which is known to improve performance. However, there are two considerations to make: any sketch using sha1 in 2019 for business critical (performance) logic is probably at their own fault, and really shouldn't do that anymore given that sha1 is no longer "secure" and b) this is not adding a new implementation, it just uses the one from bearssl. Any bugfix/improvements made to sha1 performance in bearssl (where it actually if anywhere at all still matters) will automatically translate to an improvement here. The main purpose of this change is not to improve performance but to remove repetition.

That being said, you probably want numbers anyway, so I made some strawman benchmark:

using sha(const char*, size, shasum) (80 MHz):

micros for abc: 632 (+56.1%)
micros for 2048 a's: 1096 (+32.5%)

using sha1(String) and 80 MHz:

micros for "abc": 604 (+50%)
micros for 2048 a's: 1623 (+32.1%)

using sha1(const char*, size) (80 Mhz):

micros for abc: 619 (+49.8%)
micros for 2048 a's: 1626 (+42.5%)

using sha1(const char*, size) (160mhz):

micros for abc: 344 (+42.7%)
micros for 2048 a's: 835 (+39.8%)

conclusions:

This is entirely cpu bound, increasing freq to 160MHz very proportionally translates into a speedup.
The optimisations in the hexifed-string-creation part and the pass-by-reference of the sha1() function variants offset the cost of the new function somewhat, by reducing the slowdown by roughly 10%
The actual SHA1 calculation slowdown is between 56.1% (for single block/very short input) and 32.5% for long input. I don't know what the typical input data length is though to say what the average expected slowdown is, but anyway it is still the same magnitude of runtime.

So the numbers are surprising, because it is not clear why single-block(short) input should be 56% slower while longer input (where loop unrolling etc would be helpful) is only 32.5%. it is entirely possible that there is some performance tuning to make in bearssl's implementation for very short input, in any way this change actually makes that tuning effort worthwhile because we no longer duplicate the implementation twice.

So on the other side we're saving a few kilobytes (!) of flash. which for all projects that I was involved in was worth a lot more than speed.

earlephilhower · 2019-09-30T21:43:48Z

I would say that large block runs suffer from cache issues with the unrolled original code, if not directly in the sha1 implementation than in other routines that are going on in parallel. So IPC will be lower than a smaller routine.

The much smaller block puts less pressure on the cache, so while it may be take X% more instructions executed, the average IPC has gone up due to lower cache misses.

In any case, neither runtime seems to be that significant given the code savings.

I should probably pull the Updater SHA256 into this class at some time, from BearSSLHelpers. I honestly didn't even know this "library" (of only a single hash routine?!) even existed.

devyte

The main purpose of this change is not to improve performance but to remove repetition.

Like you just found out, the removal of that repetition causes a significant slowdown of up to 1.5X in the hash function, and that could impact users no matter what they're using it for. Therefore, the slowdown needed to be investigated and documented here. Now, if anyone is in fact impacted, they'll (eventually) find this PR, and read that we're making this conscious decision to take this performance hit in favor of bin size reduction and also to reduce functional duplication.

Approved.

dirkmueller force-pushed the master branch from 8a9a596 to d2d0c62 Compare September 30, 2019 07:19

earlephilhower requested changes Sep 30, 2019

View reviewed changes

libraries/Hash/src/Hash.cpp Show resolved Hide resolved

libraries/Hash/src/Hash.cpp Outdated Show resolved Hide resolved

libraries/Hash/src/Hash.cpp Show resolved Hide resolved

libraries/Hash/src/Hash.h Show resolved Hide resolved

earlephilhower added this to the 2.6.0 milestone Sep 30, 2019

Remove overload variants for sha1(...) that accept nonconst data

0bc1061

The data is always remaining unmodified, so non-const overloads are confusing and redundant. Also optimize the hexify variant a bit more.

dirkmueller mentioned this pull request Sep 30, 2019

Further const correctness / String by reference passing cleanups #6571

Merged

earlephilhower approved these changes Sep 30, 2019

View reviewed changes

devyte requested changes Sep 30, 2019

View reviewed changes

devyte approved these changes Sep 30, 2019

View reviewed changes

Merge branch 'master' into master

e9c34f5

earlephilhower merged commit 1d26b28 into esp8266:master Oct 1, 2019

earlephilhower mentioned this pull request Oct 1, 2019

Crashes since #6565 flashstring dedup #6574

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove duplicated sha1 implementation (Fixes #6568) #6569

Remove duplicated sha1 implementation (Fixes #6568) #6569

dirkmueller commented Sep 30, 2019

earlephilhower left a comment •

edited

Loading

earlephilhower commented Sep 30, 2019

earlephilhower left a comment

devyte left a comment

dirkmueller commented Sep 30, 2019 •

edited

Loading

earlephilhower commented Sep 30, 2019

devyte left a comment

Remove duplicated sha1 implementation (Fixes #6568) #6569

Remove duplicated sha1 implementation (Fixes #6568) #6569

Conversation

dirkmueller commented Sep 30, 2019

earlephilhower left a comment • edited Loading

Choose a reason for hiding this comment

earlephilhower commented Sep 30, 2019

earlephilhower left a comment

Choose a reason for hiding this comment

devyte left a comment

Choose a reason for hiding this comment

dirkmueller commented Sep 30, 2019 • edited Loading

earlephilhower commented Sep 30, 2019

devyte left a comment

Choose a reason for hiding this comment

earlephilhower left a comment •

edited

Loading

dirkmueller commented Sep 30, 2019 •

edited

Loading