Refactor and optimise size to have approximate bound #5914

TGWDB · 2021-03-11T14:46:51Z

This optimises the size function that was only used when
checking for a bound. The new bounded_size takes an
explicit bound argument and reduces the search when
this bound is (approximately) reached.

Also this fixes a bug in the complexity computation where
the complexity (size) can never be zero, instead check
against one.

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2021-03-11T16:09:18Z

Codecov Report

Merging #5914 (ef2f3e1) into develop (1b51a4d) will increase coverage by 1.46%.
The diff coverage is 73.76%.

@@             Coverage Diff             @@
##           develop    #5914      +/-   ##
===========================================
+ Coverage    73.55%   75.02%   +1.46%     
===========================================
  Files         1431     1431              
  Lines       155248   155633     +385     
===========================================
+ Hits        114189   116756    +2567     
+ Misses       41059    38877    -2182

Impacted Files	Coverage Δ
src/analyses/cfg_dominators.h	`100.00% <ø> (ø)`
src/analyses/global_may_alias.cpp	`0.00% <0.00%> (ø)`
src/goto-cc/linker_script_merge.cpp	`0.00% <0.00%> (ø)`
src/goto-instrument/accelerate/scratch_program.cpp	`65.95% <0.00%> (+9.77%)`	⬆️
src/goto-instrument/concurrency.cpp	`0.00% <0.00%> (ø)`
src/goto-instrument/dot.cpp	`0.00% <0.00%> (ø)`
src/goto-instrument/function_modifies.cpp	`0.00% <0.00%> (ø)`
src/goto-instrument/rw_set.cpp	`41.93% <0.00%> (+41.93%)`	⬆️
src/goto-instrument/thread_instrumentation.cpp	`0.00% <0.00%> (ø)`
src/goto-instrument/undefined_functions.cpp	`0.00% <0.00%> (ø)`
... and 193 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7859de...ef2f3e1. Read the comment docs.

TGWDB · 2021-03-15T15:23:15Z

@peterschrammel @kroening Any chance of a a code owner review/approval?

peterschrammel

Are there tests for this?

TGWDB · 2021-03-22T15:22:18Z

Are there tests for this?

Added unit tests now.

martin-cs · 2021-03-23T19:52:11Z

src/util/expr.cpp

-  for(const auto &op : operands())
+  auto ops = operands();
+  count += ops.size();
+  for(const auto &op : ops)


In the case of structural sharing this is (unnecessarily) exponential. Maybe use visit?

Wouldn't visit always do a full traversal, thus removing this optimisation of only doing a partial traversal?

@TGWDB Side note, please split this commit into the correctly titled optimisation and bug fix commits. I think the current title may already have mislead Martin. So lets get it sorted.

@martin-cs I'm afraid I don't fully understand your comment. In particular I don't understand the terminology "structural sharing" here. There are a few things that have come up as possible, and some comments addressing them below.

There is a concern that ops is potentially copied. I've pushed a commit to make it clear that ops is never changed and so there should be no issues related to copying of ops.

There is some question about the exponential nature of the function? The function is depth first and linear in the size of the exprt up to the bound. Note that the size() call is a constant for this datatype, and then the (depth first) walk of the exprt is linear.

There may be a concern that the depth first walk could potentially overflow the stack with many function calls. This was also a (potential) problem in the old version. The version here should reduce this risk in two ways. Firstly, at each call the entire size of the operands is added, thus capturing more of the true size faster and so reducing function calls. Secondly, the bounded nature means this should (particularly when called on large structures) return without traversing the entire exprt.

If the concern is writing a custom traversal rather than using a visitor then I guess this could be considered, assuming there is suitable motivation?

@TGWDB CPROVER shares irepts where possible. This can result in exponential compression. Consider the following expression:

let b = a + a let c = b * b let d = c + c let e = d * d

CPROVER will store a total of 5 irepts. Four of them will have the same irept appearing twice in the sub.

A naive traversal of the form that you had written will visit a 16 times rather than just once. As the sharing gets larger so the time taken becomes exponential in the size of the data structure rather than linear. This happens. It is A Problem. In the best cases the sharing is linked enough that the pass grinds to a halt and it is obvious that the code needs to be fixed. Worse is when it is an issue but no-one notices the 8* slow down on the average case of one pass. Then at some point we feed in code and ... it just times out somewhere random and without careful and detailed profiling (which you almost certainly can't do with customer code) you will not find the problem.

I say this from bitter personal experience. @tautschnig and @peterschrammel have both done great work on making things sharing aware and sharing conscious.

So, if you want to traverse an expression for an unlimited amount PLEASE do not write for loops and recursion. Use a work queue and a set for what you have seen. Or use the visitor; that is is what it is there for.

Also, in the specific use-case you want, I think you probably want the number of unique exprts, you don't want to count duplicates.

Thank you for this explanation. The details of this particular sharing issue are useful. However I have just been looking at the implementation details of exprt::visit and I do not see how it could solve the particular issue you describe. The implementation for this function can be found in visit_pre_template. It uses an std::stack as its work queue (it also uses a for loop which you object to). std::stack is a simple LIFO data structure. If the same irept is pushed onto the stack twice, then it will be popped twice and visited twice. It does not offer the duplication removal of std::set or std::unordered_set. The use of the std::stack is still worthwhile here, because storing the working data set on the heap avoids storing it in the call stack as for a recursive implementation. Recursion could overflow the call stack when processing a sufficiently deeply nested data structure. Using the visitor is good, I just don't see how it could offer the particular benefit which you purport that it does. Was there a different visitor implementation which you were thinking of instead?

@thomasspriggs apologies. There used to be a visitor that used a set to handle sharing. I would consider this a bug in exprt::visit.

No problem. I just wanted to get to the bottom of what best practice would actually be.

Changing exprt::visit to not visit shared nodes like that could break existing usages of it. Say for example if it has been used for printing code, where we actually want to print the full tree. The exprt::unique_depth_(begin/end/cbegin/cend) iterators look to offer the functionality you were expecting. But they also seem to be unused.

thomasspriggs

Looks fine to me. @peterschrammel Can you re-review?

unit/util/expr.cpp

peterschrammel

Certainly an improvement (although I don't know any user of this functionality).

src/util/expr.h

This optimises the size function that was only used when checking for a bound. The new bounded_size takes an explicit bound argument and reduces the search when this bound is (approximately) reached.

By definition exprts have at least size (complexity) 1, so check for complexity == 1 not complexity == 0.

This adds unit tests for the bounded_size function for exprs. Note that the pathological case is designed to break incorrect implementations of bounded size and is intentionally a malformed exprt.

complexity_limitert is really the only user of this method, and may be fine with its implementation details (cf. discussion in diffblue#5914). For other potential users such assumptions might not be safe.

TGWDB requested review from kroening, peterschrammel, romainbrenguier, smowton and tautschnig as code owners March 11, 2021 14:46

peterschrammel reviewed Mar 15, 2021

View reviewed changes

TGWDB requested review from chrisr-diffblue, hannes-steffenhagen-diffblue, martin-cs and a team as code owners March 22, 2021 15:06

TGWDB force-pushed the optimize-expr-size branch 2 times, most recently from ccea841 to 370dd9e Compare March 22, 2021 15:21

martin-cs reviewed Mar 23, 2021

View reviewed changes

TGWDB force-pushed the optimize-expr-size branch from 370dd9e to 6339a5b Compare March 24, 2021 11:18

TGWDB changed the title ~~Refactor size to have approximate bound~~ Refactor and optimise size to have approximate bound Mar 24, 2021

thomasspriggs approved these changes Mar 24, 2021

View reviewed changes

unit/util/expr.cpp Show resolved Hide resolved

peterschrammel approved these changes Mar 24, 2021

View reviewed changes

src/util/expr.h Show resolved Hide resolved

TGWDB added 3 commits March 25, 2021 09:50

Optimise size to have approximate bound

bc00629

This optimises the size function that was only used when checking for a bound. The new bounded_size takes an explicit bound argument and reduces the search when this bound is (approximately) reached.

Fix bug in complexity calculation

b3c4aeb

By definition exprts have at least size (complexity) 1, so check for complexity == 1 not complexity == 0.

Add unit tests for bounded_size

ef2f3e1

This adds unit tests for the bounded_size function for exprs. Note that the pathological case is designed to break incorrect implementations of bounded size and is intentionally a malformed exprt.

TGWDB force-pushed the optimize-expr-size branch from 6339a5b to ef2f3e1 Compare March 25, 2021 09:51

TGWDB merged commit 551ac6e into diffblue:develop Mar 25, 2021

tautschnig mentioned this pull request Mar 31, 2021

Move exprt::bounded_size to complexity_limitert #5988

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor and optimise size to have approximate bound #5914

Refactor and optimise size to have approximate bound #5914

Uh oh!

TGWDB commented Mar 11, 2021 •

edited

Loading

Uh oh!

codecov bot commented Mar 11, 2021 •

edited

Loading

Uh oh!

TGWDB commented Mar 15, 2021

Uh oh!

peterschrammel left a comment

Uh oh!

TGWDB commented Mar 22, 2021

Uh oh!

martin-cs Mar 23, 2021

Uh oh!

thomasspriggs Mar 24, 2021

Uh oh!

thomasspriggs Mar 24, 2021

Uh oh!

TGWDB Mar 24, 2021 •

edited

Loading

Uh oh!

martin-cs Mar 25, 2021

Uh oh!

thomasspriggs Mar 25, 2021

Uh oh!

martin-cs Mar 28, 2021

Uh oh!

thomasspriggs Mar 29, 2021

Uh oh!

thomasspriggs left a comment

Uh oh!

Uh oh!

peterschrammel left a comment

Uh oh!

Uh oh!

Uh oh!

Refactor and optimise size to have approximate bound #5914

Refactor and optimise size to have approximate bound #5914

Uh oh!

Conversation

TGWDB commented Mar 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TGWDB commented Mar 15, 2021

Uh oh!

peterschrammel left a comment

Choose a reason for hiding this comment

Uh oh!

TGWDB commented Mar 22, 2021

Uh oh!

martin-cs Mar 23, 2021

Choose a reason for hiding this comment

Uh oh!

thomasspriggs Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

thomasspriggs Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

TGWDB Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martin-cs Mar 25, 2021

Choose a reason for hiding this comment

Uh oh!

thomasspriggs Mar 25, 2021

Choose a reason for hiding this comment

Uh oh!

martin-cs Mar 28, 2021

Choose a reason for hiding this comment

Uh oh!

thomasspriggs Mar 29, 2021

Choose a reason for hiding this comment

Uh oh!

thomasspriggs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

peterschrammel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TGWDB commented Mar 11, 2021 •

edited

Loading

codecov bot commented Mar 11, 2021 •

edited

Loading

TGWDB Mar 24, 2021 •

edited

Loading