-
Notifications
You must be signed in to change notification settings - Fork 274
Refactor and optimise size to have approximate bound #5914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #5914 +/- ##
===========================================
+ Coverage 73.55% 75.02% +1.46%
===========================================
Files 1431 1431
Lines 155248 155633 +385
===========================================
+ Hits 114189 116756 +2567
+ Misses 41059 38877 -2182
Continue to review full report at Codecov.
|
@peterschrammel @kroening Any chance of a a code owner review/approval? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests for this?
ccea841
to
370dd9e
Compare
Added unit tests now. |
for(const auto &op : operands()) | ||
auto ops = operands(); | ||
count += ops.size(); | ||
for(const auto &op : ops) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of structural sharing this is (unnecessarily) exponential. Maybe use visit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't visit
always do a full traversal, thus removing this optimisation of only doing a partial traversal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TGWDB Side note, please split this commit into the correctly titled optimisation and bug fix commits. I think the current title may already have mislead Martin. So lets get it sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-cs I'm afraid I don't fully understand your comment. In particular I don't understand the terminology "structural sharing" here. There are a few things that have come up as possible, and some comments addressing them below.
- There is a concern that
ops
is potentially copied. I've pushed a commit to make it clear thatops
is never changed and so there should be no issues related to copying ofops
. - There is some question about the exponential nature of the function? The function is depth first and linear in the size of the exprt up to the bound. Note that the
size()
call is a constant for this datatype, and then the (depth first) walk of the exprt is linear. - There may be a concern that the depth first walk could potentially overflow the stack with many function calls. This was also a (potential) problem in the old version. The version here should reduce this risk in two ways. Firstly, at each call the entire size of the
operands
is added, thus capturing more of the true size faster and so reducing function calls. Secondly, the bounded nature means this should (particularly when called on large structures) return without traversing the entire exprt. - If the concern is writing a custom traversal rather than using a visitor then I guess this could be considered, assuming there is suitable motivation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TGWDB CPROVER shares irept
s where possible. This can result in exponential compression. Consider the following expression:
let b = a + a
let c = b * b
let d = c + c
let e = d * d
CPROVER will store a total of 5 irepts. Four of them will have the same irept appearing twice in the sub
.
A naive traversal of the form that you had written will visit a
16 times rather than just once. As the sharing gets larger so the time taken becomes exponential in the size of the data structure rather than linear. This happens. It is A Problem. In the best cases the sharing is linked enough that the pass grinds to a halt and it is obvious that the code needs to be fixed. Worse is when it is an issue but no-one notices the 8* slow down on the average case of one pass. Then at some point we feed in code and ... it just times out somewhere random and without careful and detailed profiling (which you almost certainly can't do with customer code) you will not find the problem.
I say this from bitter personal experience. @tautschnig and @peterschrammel have both done great work on making things sharing aware and sharing conscious.
So, if you want to traverse an expression for an unlimited amount PLEASE do not write for loops and recursion. Use a work queue and a set for what you have seen. Or use the visitor; that is is what it is there for.
Also, in the specific use-case you want, I think you probably want the number of unique exprts, you don't want to count duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this explanation. The details of this particular sharing issue are useful. However I have just been looking at the implementation details of exprt::visit
and I do not see how it could solve the particular issue you describe. The implementation for this function can be found in visit_pre_template
. It uses an std::stack
as its work queue (it also uses a for loop which you object to). std::stack
is a simple LIFO data structure. If the same irept
is pushed onto the stack twice, then it will be popped twice and visited twice. It does not offer the duplication removal of std::set
or std::unordered_set
. The use of the std::stack
is still worthwhile here, because storing the working data set on the heap avoids storing it in the call stack as for a recursive implementation. Recursion could overflow the call stack when processing a sufficiently deeply nested data structure. Using the visitor is good, I just don't see how it could offer the particular benefit which you purport that it does. Was there a different visitor implementation which you were thinking of instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasspriggs apologies. There used to be a visitor that used a set to handle sharing. I would consider this a bug in exprt::visit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. I just wanted to get to the bottom of what best practice would actually be.
Changing exprt::visit
to not visit shared nodes like that could break existing usages of it. Say for example if it has been used for printing code, where we actually want to print the full tree. The exprt::unique_depth_(begin/end/cbegin/cend)
iterators look to offer the functionality you were expecting. But they also seem to be unused.
370dd9e
to
6339a5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. @peterschrammel Can you re-review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly an improvement (although I don't know any user of this functionality).
This optimises the size function that was only used when checking for a bound. The new bounded_size takes an explicit bound argument and reduces the search when this bound is (approximately) reached.
By definition exprts have at least size (complexity) 1, so check for complexity == 1 not complexity == 0.
This adds unit tests for the bounded_size function for exprs. Note that the pathological case is designed to break incorrect implementations of bounded size and is intentionally a malformed exprt.
6339a5b
to
ef2f3e1
Compare
complexity_limitert is really the only user of this method, and may be fine with its implementation details (cf. discussion in diffblue#5914). For other potential users such assumptions might not be safe.
complexity_limitert is really the only user of this method, and may be fine with its implementation details (cf. discussion in diffblue#5914). For other potential users such assumptions might not be safe.
complexity_limitert is really the only user of this method, and may be fine with its implementation details (cf. discussion in diffblue#5914). For other potential users such assumptions might not be safe.
complexity_limitert is really the only user of this method, and may be fine with its implementation details (cf. discussion in diffblue#5914). For other potential users such assumptions might not be safe.
complexity_limitert is really the only user of this method, and may be fine with its implementation details (cf. discussion in diffblue#5914). For other potential users such assumptions might not be safe.
complexity_limitert is really the only user of this method, and may be fine with its implementation details (cf. discussion in diffblue#5914). For other potential users such assumptions might not be safe.
This optimises the size function that was only used when
checking for a bound. The new bounded_size takes an
explicit bound argument and reduces the search when
this bound is (approximately) reached.
Also this fixes a bug in the complexity computation where
the complexity (size) can never be zero, instead check
against one.