Skip to content

Use sharing_mapt to store propagation map #4464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tautschnig
Copy link
Collaborator

goto-symex copies a state, which contains a propagation map, at each
branching point. The paths originating at that branch point typically
share most of the constants being propagated. Thus the copy is
unnecessarily expensive, and so is the merge at the next join point,
which will also result in one of the propagation maps to be destroyed.
With sharing_mapt, the copy has constant cost, and the cost of
destruction only depends on the size of the delta.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig added Symbolic Execution Performance Optimisations Needs data This PR claims improvements that require further data to substantiate the claims. labels Mar 30, 2019
@tautschnig tautschnig self-assigned this Mar 30, 2019
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 496eb15).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/106435134

@tautschnig
Copy link
Collaborator Author

Some stats for SV-COMP's ReachSafety-ECA:

  • goto_statet::goto_statet(goto_symex_statet const&): develop: 1423.62 seconds; this branch: 827.72 seconds (cf. Use sharing_mapt as container in value_sett #4463, which reduces the cost in a different way, and their savings would actually add up)
  • Removing entries from the goto_state list: develop: 587.84 seconds; this branch: 372.36 seconds (again, the savings of Use sharing_mapt as container in value_sett #4463 and this one would actually add up)
  • L2-renaming: develop: 457.98 seconds; this branch: 494.23 seconds

Lookups in the propagation map are more expensive (as expected), but merging and cloning is a lot cheaper (as intended).

@tautschnig tautschnig removed the Needs data This PR claims improvements that require further data to substantiate the claims. label Mar 30, 2019
@@ -86,7 +89,10 @@ void goto_statet::apply_condition(
increase_generation(
l1_identifier, l1_lhs, previous_state.get_l2_name_provider());

propagation[l1_identifier] = rhs;
if(propagation.has_key(l1_identifier))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to just give sharing_mapt a primitive set operation to save the double lookup

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpoe maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to use find(l1_identifier) here and then check that rhs is different from the current value in the map, and only then do the replacement. When using this idiom a separate set() method should be unnecessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tautschnig tautschnig added the Needs data This PR claims improvements that require further data to substantiate the claims. label Apr 1, 2019
@@ -86,7 +89,10 @@ void goto_statet::apply_condition(
increase_generation(
l1_identifier, l1_lhs, previous_state.get_l2_name_provider());

propagation[l1_identifier] = rhs;
if(propagation.has_key(l1_identifier))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to use find(l1_identifier) here and then check that rhs is different from the current value in the map, and only then do the replacement. When using this idiom a separate set() method should be unnecessary.

propagation[l1_identifier] = rhs;
{
if(propagation.has_key(l1_identifier))
propagation.replace(l1_identifier, rhs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if rhs is different from the value already in the map (see above).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@smowton
Copy link
Contributor

smowton commented Apr 2, 2019

This one made almost no difference to our Webgoat lesson, I expect because generally the value-set is much bigger, but if we merge #3619 then that will no longer hold and this will be more important.

@danpoe
Copy link
Contributor

danpoe commented Apr 2, 2019

Here are some results for the runtime of symex only, with --unwind 1, on several methods from Apache Tika on which symex takes between 5 and 15 sec:
Rplot

Overall with sharing it was 9% slower, including the changes to apply_condition() and assignment() it was 7% slower. Somewhat unexpectedly, when using sharing in both the value set and the propagation map (i.e., this PR and PR #4463, both including the fixes) overall symex was 8% faster than develop. I would have expected the effects of both to cancel each other out. Does anyone have an idea of why it could behave like this?

@tautschnig
Copy link
Collaborator Author

I'll try to implement the suggestions I have received across the two PRs, and would then tend to merge #4463, which seems to be uniformly better. We can then re-evaluate this one to see whether the somewhat surprising observation made by @danpoe carries on.

@tautschnig tautschnig assigned tautschnig and unassigned smowton Apr 3, 2019
@tautschnig tautschnig force-pushed the use-sharing-map-for-propagation branch from 496eb15 to be7d917 Compare April 3, 2019 08:46
@tautschnig
Copy link
Collaborator Author

@danpoe As #4463 is merged I have rebased this one. Would you (and possibly also @romainbrenguier and @smowton) mind doing another benchmarking run to see where we are?

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: be7d917).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/106873800

@danpoe
Copy link
Contributor

danpoe commented Apr 3, 2019

Here are some results for the runtime of symex only, with --unwind 1, on several methods from Apache Tika on which symex takes between 5 and 15 sec:

Rplot

The runtime is essentially the same as on the latest develop (overall with sharing it's 0.3% faster).

@tautschnig
Copy link
Collaborator Author

It's probably best to put this on hold and look at other performance improvements first. It may also become a lot more relevant once we have field sensitivity in place as propagation maps will then likely be larger.

@allredj
Copy link
Contributor

allredj commented Apr 4, 2019

Yep, that's also the conclusion we reached with @danpoe.

@tautschnig tautschnig force-pushed the use-sharing-map-for-propagation branch 2 times, most recently from 72870e4 to 3bb8488 Compare April 4, 2019 23:59
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 3bb8488).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/107164126
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@tautschnig tautschnig force-pushed the use-sharing-map-for-propagation branch from 3bb8488 to f59007d Compare May 3, 2019 09:48
@smowton
Copy link
Contributor

smowton commented May 3, 2019

Needs propagation.erase -> propagation.erase_if_exists in symex_assign.cpp

@smowton
Copy link
Contributor

smowton commented May 3, 2019

Benchmark results for a long-running Webgoat analysis:

Develop: 77s
#4464: 57s
#4464 + #4505: 26s

goto-symex copies a state, which contains a propagation map, at each
branching point. The paths originating at that branch point typically
share most of the constants being propagated. Thus the copy is
unnecessarily expensive, and so is the merge at the next join point,
which will also result in one of the propagation maps to be destroyed.
With sharing_mapt, the copy has constant cost, and the cost of
destruction only depends on the size of the delta.
@tautschnig tautschnig force-pushed the use-sharing-map-for-propagation branch from f59007d to 32dc6d6 Compare May 5, 2019 09:18
@tautschnig
Copy link
Collaborator Author

Needs propagation.erase -> propagation.erase_if_exists in symex_assign.cpp

Thank you!! Done. Based on @smowton's data I'm inclined to merge this once CI passes.

@tautschnig tautschnig merged commit af9814d into diffblue:develop May 5, 2019
@tautschnig tautschnig deleted the use-sharing-map-for-propagation branch May 5, 2019 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs data This PR claims improvements that require further data to substantiate the claims. Performance Optimisations Symbolic Execution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants