-
Notifications
You must be signed in to change notification settings - Fork 273
Performance optimisation - Use unordered_map
in numbering
instead of map
#5539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimisation - Use unordered_map
in numbering
instead of map
#5539
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #5539 +/- ##
========================================
Coverage 68.53% 68.53%
========================================
Files 1187 1187
Lines 98259 98259
========================================
Hits 67345 67345
Misses 30914 30914
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. One thing to have a look at is the status of caching of the hashes in dt
. IIRC @tautschnig and @peterschrammel worked on this some time ago and I'm not sure if all of the options were enabled.
de45b2d
to
f91394b
Compare
It's odd that irep.h is a dependency of numbering.h. I'd prefer an 'unordered_numbering' and an 'ordered_numbering', following the naming style of the STL. There may not be a use-case for the ordered numbering. |
I agree with @kroening here, this can probably be done without special casing ireps. I believe @thomasspriggs is investigating if we can basically just use |
f91394b
to
c425d29
Compare
I have now refactored this PR to avoid either |
c425d29
to
7d75607
Compare
@thomasspriggs What is the source of those |
@tautschnig For the particular example I have been examining, much of the overall runtime is spent in From looking at the code I would say that the improvement in this PR looks unrelated to the improvement in #1997. As far as I can see, |
@thomasspriggs Thank you for clarifying! |
Removing the duplicated types will enable changing the type of numbering which backs union find, with fewer lines of code changed.
Because this matches the current coding standard and because it improves readability.
This commit defines a new `numberingt`, which is equivalent to the existing `hash_numbering` type. This new name and its template parameters match the cbmc coding standards by ending the type names with a letter 't'. It also supports using `std::hash` by default, which will saves passing a hash function in the case where there is a standard hash function defined for a given type of `keyt`. The intention is to add this new type in this commit, swap over the existing usages of numbering in the subsequent commits and then finally remove the old variations of numbering. This will improve performance in one instance and will improve maintainability in others. Maintainability will be improved with a single version, because there won't be multiple versions to maintain and it will take away the possibility of accidentally choosing a variation which uses `irept` as the key in the backing `std::map`.
This avoids slow lookups into `std::map` caused by long running `irept::operator<` calls. The type of hash to use is included as a template parameter in order to make the data structure used to implement `union_find` more explicit. Run time on example exercising arrays code before this commit - real 2m39.018s user 2m37.394s sys 0m1.608s Run time on example exercising arrays code after this commit - real 0m30.897s user 0m29.476s sys 0m1.404s
Because the naming follows our coding standards and it reduces the number of numbering variations to maintain.
Because the naming follows our coding standards and it reduces the number of numbering variations to maintain. This does change the type of backing container used in these places, however the performance should be at least equivalent.
There is no need for separate a separe type alias and implementaion if numbering is always done using the same kind of map. Therefore we can use a single name instead of two, which means less to maintain.
7d75607
to
2bf69bf
Compare
numbering
use unordered_map
if numbering irep
sunordered_map
in numbering
instead of map
The commit which specifically gives the performance improvement is titled "Make
union_find
use hashes for numbering". The other commits are related refactoring.Using
unordered_map
innumbering
instead ofmap
, changesunion_find
, which changesarrays
, where the performance issue was seen. This change avoids slow lookups intostd::map
caused by long runningirept::operator<
calls. I have an example where cbmc spent approximately 70% of runtime inirept::operator<
before this change and approximately 1% of runtime in hashing/ordering after this change.Run time on example exercising arrays code before this PR -
real 2m39.018s
user 2m37.394s
sys 0m1.608s
Run time on example exercising arrays code after this PR -
real 0m30.897s
user 0m29.476s
sys 0m1.404s
Gprof before (top results up to
has_byte_operator
+irept::hash()
) -Gprof after (top results up to
has_byte_operator
+irept::hash()
) -