Skip to content

[TG-9830] [UFC] Exclude symbols with empty module from the symbol module map #5157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

antlechner
Copy link
Contributor

This improves performance whenever we don't use the symbol module map (which, as far as I know, is currently only used in EBMC). See first commit message for further details.
Benchmarking results will be added here when I have them.

  • Each commit message has a non-empty body, explaining why the change was made. (except for documentation and unit test)
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@antlechner antlechner changed the title Exclude symbols with empty module from the symbol module map [TG-9830] [UFC] Exclude symbols with empty module from the symbol module map Oct 8, 2019
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine; are we 100% convinced they don't use any empty-module symbols?

Also please document that symbols in the empty module are not recorded in the map.

Copy link
Contributor

@xbauch xbauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

symbol_base_mapt::const_iterator base_it_end =
symbol_base_map.upper_bound(symbol.base_name);
auto base_it = symbol_base_map.lower_bound(symbol.base_name);
auto base_it_end = symbol_base_map.upper_bound(symbol.base_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 base_it_end isn't mutated therefore it could be declared const auto.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 5843a44).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/130996175

@antlechner
Copy link
Contributor Author

antlechner commented Oct 9, 2019

I have benchmarked this using TG. Using our benchmarking script on the default subset of Tika, the methods that succeeded on both runs (before and after the fix from this PR) were overall 216 seconds faster after the fix (before: 1251, after: 1035). There are 231 methods in the subset so this is a significant improvement.

Another comparison I ran was a trivial method (return true, no args), with the TG version of the models (which are compatible with rt.jar) and rt.jar on the classpath. Before the fix, it took about 40 seconds, after the fix it takes 6 seconds.

@antlechner antlechner force-pushed the antonia/no_empty_module_symbols_in_module_map branch from 5843a44 to e65526a Compare October 9, 2019 13:45
@antlechner
Copy link
Contributor Author

@smowton

are we 100% convinced they don't use any empty-module symbols?

We asked @xbauch and he confirmed that this change is compatible with EBMC.

Also please document that symbols in the empty module are not recorded in the map.

Done in the same commit as the other documentation.

thomasspriggs and others added 6 commits October 9, 2019 14:49
For languages such as Java, where the `module` field of `symbolt` is
kept empty, maintaining the `internal_symbol_module_map` of the symbol
table can cause a performance bottleneck. This is because all symbols
end up mapped to the same key in this map. So when
`symbol_tablet::erase` attempts to remove a symbol from this map the
lookup results in a range containing the entire collection of symbols on
which it then performs a linear search to find the exact symbol to
remove.

Languages which don't set the `module` field never need to do any
lookups in the module map. The downstream EBMC repository always uses a
non-empty module to do its lookups. Therefore we can leave symbols with
an empty module out of the module map. This results in a performance
improvement for Java, whilst retaining EBMC compatibilty.

The performance improvement was confirmed using a subset of 231 methods
from Tika, which succeeded both with and without the fix and were
overall 216 seconds faster with the fix. With rt.jar on the classpath
(but no class from rt.jar being referenced anywhere), the time for a
trivial method went from around 40 seconds to 6 seconds.
This makes the code more readable, and the type is still clear as we are
just calling lower_bound and upper_bound on a multimap.
For each WHEN, the outer GIVEN is going to be re-run, re-initializing
the symbol table, so there is no need to reset the symbols.
@antlechner antlechner force-pushed the antonia/no_empty_module_symbols_in_module_map branch from e65526a to 4718929 Compare October 9, 2019 13:49
@segun3d
Copy link

segun3d commented Oct 9, 2019

Dev has benchmarked her fix against the statusquo on develop with proven gains in performance. I have attached a copy of the benchmark results to the Jira ticket.

QA approved.

@antlechner antlechner merged commit 093e97c into diffblue:develop Oct 9, 2019
@antlechner antlechner deleted the antonia/no_empty_module_symbols_in_module_map branch October 9, 2019 17:39
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 4718929).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/131182785

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants