Skip to content

Refactor and optimise size to have approximate bound #5914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 25, 2021

Conversation

TGWDB
Copy link
Contributor

@TGWDB TGWDB commented Mar 11, 2021

This optimises the size function that was only used when
checking for a bound. The new bounded_size takes an
explicit bound argument and reduces the search when
this bound is (approximately) reached.

Also this fixes a bug in the complexity computation where
the complexity (size) can never be zero, instead check
against one.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@codecov
Copy link

codecov bot commented Mar 11, 2021

Codecov Report

Merging #5914 (ef2f3e1) into develop (1b51a4d) will increase coverage by 1.46%.
The diff coverage is 73.76%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5914      +/-   ##
===========================================
+ Coverage    73.55%   75.02%   +1.46%     
===========================================
  Files         1431     1431              
  Lines       155248   155633     +385     
===========================================
+ Hits        114189   116756    +2567     
+ Misses       41059    38877    -2182     
Impacted Files Coverage Δ
src/analyses/cfg_dominators.h 100.00% <ø> (ø)
src/analyses/global_may_alias.cpp 0.00% <0.00%> (ø)
src/goto-cc/linker_script_merge.cpp 0.00% <0.00%> (ø)
src/goto-instrument/accelerate/scratch_program.cpp 65.95% <0.00%> (+9.77%) ⬆️
src/goto-instrument/concurrency.cpp 0.00% <0.00%> (ø)
src/goto-instrument/dot.cpp 0.00% <0.00%> (ø)
src/goto-instrument/function_modifies.cpp 0.00% <0.00%> (ø)
src/goto-instrument/rw_set.cpp 41.93% <0.00%> (+41.93%) ⬆️
src/goto-instrument/thread_instrumentation.cpp 0.00% <0.00%> (ø)
src/goto-instrument/undefined_functions.cpp 0.00% <0.00%> (ø)
... and 193 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7859de...ef2f3e1. Read the comment docs.

@TGWDB
Copy link
Contributor Author

TGWDB commented Mar 15, 2021

@peterschrammel @kroening Any chance of a a code owner review/approval?

Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests for this?

@TGWDB TGWDB force-pushed the optimize-expr-size branch 2 times, most recently from ccea841 to 370dd9e Compare March 22, 2021 15:21
@TGWDB
Copy link
Contributor Author

TGWDB commented Mar 22, 2021

Are there tests for this?

Added unit tests now.

for(const auto &op : operands())
auto ops = operands();
count += ops.size();
for(const auto &op : ops)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of structural sharing this is (unnecessarily) exponential. Maybe use visit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't visit always do a full traversal, thus removing this optimisation of only doing a partial traversal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TGWDB Side note, please split this commit into the correctly titled optimisation and bug fix commits. I think the current title may already have mislead Martin. So lets get it sorted.

Copy link
Contributor Author

@TGWDB TGWDB Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-cs I'm afraid I don't fully understand your comment. In particular I don't understand the terminology "structural sharing" here. There are a few things that have come up as possible, and some comments addressing them below.

  1. There is a concern that ops is potentially copied. I've pushed a commit to make it clear that ops is never changed and so there should be no issues related to copying of ops.
  2. There is some question about the exponential nature of the function? The function is depth first and linear in the size of the exprt up to the bound. Note that the size() call is a constant for this datatype, and then the (depth first) walk of the exprt is linear.
  3. There may be a concern that the depth first walk could potentially overflow the stack with many function calls. This was also a (potential) problem in the old version. The version here should reduce this risk in two ways. Firstly, at each call the entire size of the operands is added, thus capturing more of the true size faster and so reducing function calls. Secondly, the bounded nature means this should (particularly when called on large structures) return without traversing the entire exprt.
  4. If the concern is writing a custom traversal rather than using a visitor then I guess this could be considered, assuming there is suitable motivation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TGWDB CPROVER shares irepts where possible. This can result in exponential compression. Consider the following expression:

let b = a + a
let c = b * b
let d = c + c
let e = d * d

CPROVER will store a total of 5 irepts. Four of them will have the same irept appearing twice in the sub.

A naive traversal of the form that you had written will visit a 16 times rather than just once. As the sharing gets larger so the time taken becomes exponential in the size of the data structure rather than linear. This happens. It is A Problem. In the best cases the sharing is linked enough that the pass grinds to a halt and it is obvious that the code needs to be fixed. Worse is when it is an issue but no-one notices the 8* slow down on the average case of one pass. Then at some point we feed in code and ... it just times out somewhere random and without careful and detailed profiling (which you almost certainly can't do with customer code) you will not find the problem.

I say this from bitter personal experience. @tautschnig and @peterschrammel have both done great work on making things sharing aware and sharing conscious.

So, if you want to traverse an expression for an unlimited amount PLEASE do not write for loops and recursion. Use a work queue and a set for what you have seen. Or use the visitor; that is is what it is there for.

Also, in the specific use-case you want, I think you probably want the number of unique exprts, you don't want to count duplicates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this explanation. The details of this particular sharing issue are useful. However I have just been looking at the implementation details of exprt::visit and I do not see how it could solve the particular issue you describe. The implementation for this function can be found in visit_pre_template. It uses an std::stack as its work queue (it also uses a for loop which you object to). std::stack is a simple LIFO data structure. If the same irept is pushed onto the stack twice, then it will be popped twice and visited twice. It does not offer the duplication removal of std::set or std::unordered_set. The use of the std::stack is still worthwhile here, because storing the working data set on the heap avoids storing it in the call stack as for a recursive implementation. Recursion could overflow the call stack when processing a sufficiently deeply nested data structure. Using the visitor is good, I just don't see how it could offer the particular benefit which you purport that it does. Was there a different visitor implementation which you were thinking of instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasspriggs apologies. There used to be a visitor that used a set to handle sharing. I would consider this a bug in exprt::visit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I just wanted to get to the bottom of what best practice would actually be.

Changing exprt::visit to not visit shared nodes like that could break existing usages of it. Say for example if it has been used for printing code, where we actually want to print the full tree. The exprt::unique_depth_(begin/end/cbegin/cend) iterators look to offer the functionality you were expecting. But they also seem to be unused.

@TGWDB TGWDB force-pushed the optimize-expr-size branch from 370dd9e to 6339a5b Compare March 24, 2021 11:18
@TGWDB TGWDB changed the title Refactor size to have approximate bound Refactor and optimise size to have approximate bound Mar 24, 2021
Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. @peterschrammel Can you re-review?

Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly an improvement (although I don't know any user of this functionality).

TGWDB added 3 commits March 25, 2021 09:50
This optimises the size function that was only used when
checking for a bound. The new bounded_size takes an
explicit bound argument and reduces the search when
this bound is (approximately) reached.
By definition exprts have at least size (complexity) 1, so
check for complexity == 1 not complexity == 0.
This adds unit tests for the bounded_size function for
exprs. Note that the pathological case is designed to break
incorrect implementations of bounded size and is intentionally
a malformed exprt.
@TGWDB TGWDB force-pushed the optimize-expr-size branch from 6339a5b to ef2f3e1 Compare March 25, 2021 09:51
@TGWDB TGWDB merged commit 551ac6e into diffblue:develop Mar 25, 2021
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Mar 31, 2021
complexity_limitert is really the only user of this method, and may be
fine with its implementation details (cf. discussion in diffblue#5914). For
other potential users such assumptions might not be safe.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Mar 31, 2021
complexity_limitert is really the only user of this method, and may be
fine with its implementation details (cf. discussion in diffblue#5914). For
other potential users such assumptions might not be safe.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Mar 31, 2021
complexity_limitert is really the only user of this method, and may be
fine with its implementation details (cf. discussion in diffblue#5914). For
other potential users such assumptions might not be safe.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Mar 31, 2021
complexity_limitert is really the only user of this method, and may be
fine with its implementation details (cf. discussion in diffblue#5914). For
other potential users such assumptions might not be safe.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Mar 31, 2021
complexity_limitert is really the only user of this method, and may be
fine with its implementation details (cf. discussion in diffblue#5914). For
other potential users such assumptions might not be safe.
jezhiggins pushed a commit to jezhiggins/cbmc that referenced this pull request May 3, 2021
complexity_limitert is really the only user of this method, and may be
fine with its implementation details (cf. discussion in diffblue#5914). For
other potential users such assumptions might not be safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants