-
Notifications
You must be signed in to change notification settings - Fork 274
Value-set dereference: use cond_exprt to avoid quadratic guards #4555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value-set dereference: use cond_exprt to avoid quadratic guards #4555
Conversation
This indicates the conditionals are mutually exclusive -- i.e. it is more like a switch expression than an if-elseif-else expression. This is useful as the implied guards are linear in the number of cases, not quadratic.
Of course I'll add tests and benchmarking figures if people are happy with the overall approach. Anecdata so far: improves run time for an alias set of size 30 from around a minute to less than a second. |
6475964
to
887a60b
Compare
@tautschnig @kroening this now passes tests, and is ready for review |
887a60b
to
e001dd3
Compare
|
||
for(std::size_t i = 0; i < lhs.get_n_cases(); ++i) | ||
{ | ||
exprt renamed_guard = state.rename(lhs.condition(i), ns).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 It's confusing that the variable names don't distinguish between the incoming guard and the extra bit coming from the current case we're considering. Maybe renamed_case_guard
?
@@ -74,6 +75,9 @@ bool has_subtype(const typet &, const irep_idt &id, const namespacet &); | |||
/// lift up an if_exprt one level | |||
if_exprt lift_if(const exprt &, std::size_t operand_number); | |||
|
|||
/// lift up an cond_exprt one level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"an cond_exprt" -> "a cond_exprt"
Value-set dereferencing always produces mutually exclusive conditions -- if p points to a, else if it points to b... and so on. By using an exclusive cond_exprt we can avoid creating quadratic series of guards like "p == &a", "p != &a && p == &b", "p != &a && p != &b && p == &c"...
This is analogous to its existing handling of if_exprt on the left, except that for an exclusive cond_exprt there is no need to accumulate increasingly large guards.
This is just like lift_if, but for cond_exprt
This handles all the cases in the simplifier that special-cased if_exprt
There are too many downstream components, especially the string solver, that don't understand them yet.
e001dd3
to
2e48edd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: 2e48edd).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/109308178
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good idea, but can we also have some tests that, e.g., look at the output of --program-only
or --show-vcc
?
@@ -107,12 +107,12 @@ exprt value_set_dereferencet::dereference(const exprt &pointer) | |||
may_fail=true; | |||
} | |||
|
|||
exprt failure_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it too non-trivial to check that failure_value
is never used uninitialised. See comment below.
// two however, so purely by historical accident, the failed object takes | ||
// precedence: | ||
|
||
if(may_fail || value_without_condition.has_value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just combine value_without_condition
and failure_value
into a single expression?
} | ||
|
||
// Restore the guard to its state before entering this function: | ||
INVARIANT(guard.size() >= old_guard_size, "must not shrink the guard!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/shrink/grow/?
if_exprt(cond_expr.condition(i), cond_expr.value(i), new_expr); | ||
} | ||
if(i == 0) | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use for(std::size_t i = cond_expr.get_n_cases(); i > 0; --i)
and then if(i == cond_expr.get_n_cases())
as well as cond_expr.value(i - 1)
and cond_expr.condition(i - 1)`.
I am a bit worried that the |
I re-tested this on security now that it passes tests and found that all the previously observed performance benefits have evaporated, so presumably I was benefiting from bugs (probably false pointer guards leading to whole assignments getting simplified away), not the actual improvement I thought I was. I observe some performance benefit when Notes going forward:
The aliasing problem on the RHS is much worse than the LHS (192 aliases vs. 20), but significant savings could still probably be made by simplifying and renaming the RHS once prior to |
Further investigation of point (1) above: the strings of negations result from simplifying an expression of the form |
The problem
When we have a moderately complicated pointer expression,
p
, with a large-ish alias set (sizen
) namedo1
,o2
...on
, thenvalue_set_dereferencet
would create an expression of the form:p == &o1 ? o1 : p == &o2 ? o2 : p == &o3 : ...
So far so good -- the expression
p
has been duplicatedn
times, but irep sharing means that's not actually very costly.However, if the deref occurs on the LHS, then
symex_assign_if
expands this into a series of assignments:There are now n^2 copies of
p
, and thanks to renaming they are no longer shared. The simplifier then proceeds to spend quite a long time on this assignment sequence if n is large-ish and so n^2 is very large.The solution
The conditions
p != &o1 && p != &o2 && p == &o3
are actually redundant, but our nestedif_exprt
chain doesn't actually make that clear. Therefore I add a flagis_exclusive
to the already-existingcond_exprt
, which expresses an if-elseif-elseif-... chain in one instruction. This means that we now getBack to linear complexity on the number of aliases. Much better. 90% of the diff in the PR is teaching various passes how to deal with
cond_exprt
, which presumably we'd want to do sooner or later; review commit-by-commit.Note this doesn't implement cond-expr everywhere; particularly the string solver doesn't understand it yet. For now the expression is turned back into nested-ifs before it gets to that module.
Rejected solutions
Use a let-expression: Rather than try to avoid quadratic conditionals we could just surround the whole thing in
let p2 = p in ...
. I think this would work:symex_assign
would need to learn to evaluate let-around-if, presumably leading tolet p2 = p in p2 != &o1 && p2 != &o2 && p2 == &o3 ...
, and the simplifier would have a linear number ofp
instances to attack and a quadratic number of shallow==
nodes. I thought it was better to avoid the quadratic syntax tree node problem at its root.Use caching in the simplifier: While a quadratic number of comparisons would still occur, the simplifier can check before simplifying if it has seen an identical node before and avoid the repeated work. However this doesn't really deal with the case where
p
is complicated and doesn't simplify much or at all -- in this case renaming and subsequent passes still have a huge expression to deal with.Avoid quadratic guards by analysis: when symex comes to add
p == &o3
to its stack of guards, it could also remove any that are incompatible with it. This feels brittle to me -- the tests generated byvalue_set_dereferencet
are not always that simple, so we'd have to at least cover the various expressions it creates, and would rely on the guard elimination remaining clever enough if/when value-set-deref produces different expressions. We could try doing this as well, but I'd prefer if value-set-deref was explicit that it intends to produce a disjoint switch.