-
Notifications
You must be signed in to change notification settings - Fork 274
[TG-9830] [UFC] Exclude symbols with empty module from the symbol module map #5157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TG-9830] [UFC] Exclude symbols with empty module from the symbol module map #5157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine; are we 100% convinced they don't use any empty-module symbols?
Also please document that symbols in the empty module are not recorded in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
src/util/symbol_table.cpp
Outdated
symbol_base_mapt::const_iterator base_it_end = | ||
symbol_base_map.upper_bound(symbol.base_name); | ||
auto base_it = symbol_base_map.lower_bound(symbol.base_name); | ||
auto base_it_end = symbol_base_map.upper_bound(symbol.base_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 base_it_end
isn't mutated therefore it could be declared const auto
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 5843a44).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/130996175
I have benchmarked this using TG. Using our benchmarking script on the default subset of Tika, the methods that succeeded on both runs (before and after the fix from this PR) were overall 216 seconds faster after the fix (before: 1251, after: 1035). There are 231 methods in the subset so this is a significant improvement. Another comparison I ran was a trivial method ( |
5843a44
to
e65526a
Compare
For languages such as Java, where the `module` field of `symbolt` is kept empty, maintaining the `internal_symbol_module_map` of the symbol table can cause a performance bottleneck. This is because all symbols end up mapped to the same key in this map. So when `symbol_tablet::erase` attempts to remove a symbol from this map the lookup results in a range containing the entire collection of symbols on which it then performs a linear search to find the exact symbol to remove. Languages which don't set the `module` field never need to do any lookups in the module map. The downstream EBMC repository always uses a non-empty module to do its lookups. Therefore we can leave symbols with an empty module out of the module map. This results in a performance improvement for Java, whilst retaining EBMC compatibilty. The performance improvement was confirmed using a subset of 231 methods from Tika, which succeeded both with and without the fix and were overall 216 seconds faster with the fix. With rt.jar on the classpath (but no class from rt.jar being referenced anywhere), the time for a trivial method went from around 40 seconds to 6 seconds.
No functional change
This makes the code more readable, and the type is still clear as we are just calling lower_bound and upper_bound on a multimap.
For each WHEN, the outer GIVEN is going to be re-run, re-initializing the symbol table, so there is no need to reset the symbols.
e65526a
to
4718929
Compare
Dev has benchmarked her fix against the statusquo on develop with proven gains in performance. I have attached a copy of the benchmark results to the Jira ticket. QA approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 4718929).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/131182785
This improves performance whenever we don't use the symbol module map (which, as far as I know, is currently only used in EBMC). See first commit message for further details.
Benchmarking results will be added here when I have them.