Skip to content

Use sharing_mapt as container in value_sett #4463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tautschnig
Copy link
Collaborator

goto-symex copies a state, which contains a value set, at each branching
point. The paths originating at that branch point typically share most
of the points-to information. Thus the copy is unnecessarily expensive,
and so is the merge at the next join point, which will also result in
one of the value sets to be destroyed. With sharing_mapt, the copy has
constant cost, merging can make use of the delta view, and the cost of
destruction only depends on the size of the delta.

Includes the cleanup-commit from #4462.

Performance evaluation to be done.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig added Needs data This PR claims improvements that require further data to substantiate the claims. dependent - do not merge labels Mar 30, 2019
@tautschnig tautschnig self-assigned this Mar 30, 2019
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 442b500).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/106434546

@tautschnig
Copy link
Collaborator Author

Some stats for SV-COMP's ReachSafety-ECA (which barely uses pointers, i.e., the value set would mostly be near-empty):

  • goto_statet::goto_statet(goto_symex_statet const&): develop: 1423.62 seconds; this branch: 459.92 seconds
  • Removing entries from the goto_state list: develop: 587.84 seconds; this branch: 246.81 seconds
  • value_sett::make_union: develop: 645.59 seconds; this branch: 223.34 seconds
  • value_sett::assign: develop: 306.79 seconds; this branch: 415.09 seconds

So adding new values to the value set has become more expensive (which is not unexpected), but merging and cloning is a lot cheaper (as intended). @romainbrenguier @smowton would you be able to contribute some benchmarking on Java code, which makes a lot more use of value sets?

@@ -297,11 +295,7 @@ class value_sett
///
/// The components of the ID are thus duplicated in the `valuest` key and in
/// `entryt` fields.
#ifdef USE_DSTRING
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we reaching the point of saying that USE_DSTRING should always be on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we do still build with USE_DSTRING disabled, but I don't think the regression tests even pass. Though this change might be a step towards making that more likely, as it makes the code more uniform.

tautschnig added a commit that referenced this pull request Apr 1, 2019
@tautschnig tautschnig force-pushed the use-sharing-map-for-value-set branch from 442b500 to 441eb6c Compare April 1, 2019 08:08
@tautschnig tautschnig removed their assignment Apr 1, 2019
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 441eb6c).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/106528011

Copy link
Contributor

@danpoe danpoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{
entryt replacement = *existing_entry;
if(add_to_sets)
make_union(replacement.object_map, new_values);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check the return value of make_union() and only do the replacement when the object map has changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if(add_to_sets)
make_union(replacement.object_map, new_values);
else
replacement.object_map = new_values;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check whether new_values.read() != existing_entry->object_map.read() before doing the replacement. Not sure about the impact on performance though. We might avoid breaking sharing in some cases but there'd be additional cost for the comparisons.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a risk this is way too expensive and thus I'd rather not add this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's ok, we'd probably have to specifically benchmark this. My hunch is that the map comparisons could be cheap on average if it's likely that they differ in size or that we find a mismatch quickly.

@romainbrenguier
Copy link
Contributor

Here are the results from my benchmarks, globaly using sharing_mapt was 13% faster and seems to scale better:
chart

@@ -676,14 +676,14 @@ void goto_symext::try_filter_value_sets(
}
if(jump_taken_value_set && !erase_from_jump_taken_value_set.empty())
{
value_sett::entryt *entry = jump_taken_value_set->get_entry_for_symbol(
auto entry = jump_taken_value_set->get_index_of_symbol(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename entry to entry_name or entry_index 🍌

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

symbol_expr->get_identifier(), symbol_type, "", ns);
jump_taken_value_set->erase_values_from_entry(
*entry, erase_from_jump_taken_value_set);
}
if(jump_not_taken_value_set && !erase_from_jump_not_taken_value_set.empty())
{
value_sett::entryt *entry = jump_not_taken_value_set->get_entry_for_symbol(
auto entry = jump_not_taken_value_set->get_index_of_symbol(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍌

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -491,11 +472,11 @@ void value_sett::get_value_set_rec(
}
else if(expr.id()==ID_symbol)
{
const entryt *entry = get_entry_for_symbol(
auto entry = get_index_of_symbol(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍌

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@smowton
Copy link
Contributor

smowton commented Apr 2, 2019

Evaluated this on one of our medium-sized Java applications (a particular WebGoat lesson) -- JBMC execution time down from ~30s to ~15s.

@danpoe
Copy link
Contributor

danpoe commented Apr 2, 2019

Here are some results for the runtime of symex only, with --unwind 1, on several methods from Apache Tika on which symex takes between 5 and 15 sec:

Rplot

Overall with sharing it was 7% faster, including the changes to update_entry() it was 9% faster.

goto-symex copies a state, which contains a value set, at each branching
point. The paths originating at that branch point typically share most
of the points-to information. Thus the copy is unnecessarily expensive,
and so is the merge at the next join point, which will also result in
one of the value sets to be destroyed. With sharing_mapt, the copy has
constant cost, merging can make use of the delta view, and the cost of
destruction only depends on the size of the delta.
@tautschnig tautschnig force-pushed the use-sharing-map-for-value-set branch from 441eb6c to d4b811b Compare April 3, 2019 07:54
@tautschnig tautschnig removed the Needs data This PR claims improvements that require further data to substantiate the claims. label Apr 3, 2019
@tautschnig tautschnig merged commit d583164 into diffblue:develop Apr 3, 2019
@tautschnig tautschnig deleted the use-sharing-map-for-value-set branch April 3, 2019 08:46
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: d4b811b).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/106864446

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants