Value-set dereference: use cond_exprt to avoid quadratic guards #4555

smowton · 2019-04-18T15:26:46Z

The problem

When we have a moderately complicated pointer expression, p, with a large-ish alias set (size n) named o1, o2 ... on, then value_set_dereferencet would create an expression of the form:
p == &o1 ? o1 : p == &o2 ? o2 : p == &o3 : ...

So far so good -- the expression p has been duplicated n times, but irep sharing means that's not actually very costly.

However, if the deref occurs on the LHS, then symex_assign_if expands this into a series of assignments:

o1#2 = p == &o1 ? new_value : o1#1
o2#2 = p != &o1 && p == &o2 ? new_value : o2#1
o3#2 = p != &o1 && p != &o2 && p == &o3 ? new_value : o3#1
...

There are now n^2 copies of p, and thanks to renaming they are no longer shared. The simplifier then proceeds to spend quite a long time on this assignment sequence if n is large-ish and so n^2 is very large.

The solution

The conditions p != &o1 && p != &o2 && p == &o3 are actually redundant, but our nested if_exprt chain doesn't actually make that clear. Therefore I add a flag is_exclusive to the already-existing cond_exprt, which expresses an if-elseif-elseif-... chain in one instruction. This means that we now get

o1#2 = p == &o1 ? new_value : o1#1
o2#2 = p == &o2 ? new_value : o2#1
o3#2 = p == &o3 ? new_value : o3#1

Back to linear complexity on the number of aliases. Much better. 90% of the diff in the PR is teaching various passes how to deal with cond_exprt, which presumably we'd want to do sooner or later; review commit-by-commit.

Note this doesn't implement cond-expr everywhere; particularly the string solver doesn't understand it yet. For now the expression is turned back into nested-ifs before it gets to that module.

Rejected solutions

Use a let-expression: Rather than try to avoid quadratic conditionals we could just surround the whole thing in let p2 = p in .... I think this would work: symex_assign would need to learn to evaluate let-around-if, presumably leading to let p2 = p in p2 != &o1 && p2 != &o2 && p2 == &o3 ..., and the simplifier would have a linear number of p instances to attack and a quadratic number of shallow == nodes. I thought it was better to avoid the quadratic syntax tree node problem at its root.

Use caching in the simplifier: While a quadratic number of comparisons would still occur, the simplifier can check before simplifying if it has seen an identical node before and avoid the repeated work. However this doesn't really deal with the case where p is complicated and doesn't simplify much or at all -- in this case renaming and subsequent passes still have a huge expression to deal with.

Avoid quadratic guards by analysis: when symex comes to add p == &o3 to its stack of guards, it could also remove any that are incompatible with it. This feels brittle to me -- the tests generated by value_set_dereferencet are not always that simple, so we'd have to at least cover the various expressions it creates, and would rely on the guard elimination remaining clever enough if/when value-set-deref produces different expressions. We could try doing this as well, but I'd prefer if value-set-deref was explicit that it intends to produce a disjoint switch.

This indicates the conditionals are mutually exclusive -- i.e. it is more like a switch expression than an if-elseif-else expression. This is useful as the implied guards are linear in the number of cases, not quadratic.

smowton · 2019-04-18T15:28:57Z

Of course I'll add tests and benchmarking figures if people are happy with the overall approach. Anecdata so far: improves run time for an alias set of size 30 from around a minute to less than a second.

smowton · 2019-04-23T09:41:07Z

@tautschnig @kroening this now passes tests, and is ready for review

owen-mc-diffblue · 2019-04-18T16:20:41Z

src/goto-symex/symex_assign.cpp

+
+  for(std::size_t i = 0; i < lhs.get_n_cases(); ++i)
+  {
+    exprt renamed_guard = state.rename(lhs.condition(i), ns).get();


💡 It's confusing that the variable names don't distinguish between the incoming guard and the extra bit coming from the current case we're considering. Maybe renamed_case_guard?

owen-mc-diffblue · 2019-04-18T16:24:57Z

src/util/expr_util.h

@@ -74,6 +75,9 @@ bool has_subtype(const typet &, const irep_idt &id, const namespacet &);
 /// lift up an if_exprt one level
 if_exprt lift_if(const exprt &, std::size_t operand_number);

+/// lift up an cond_exprt one level


"an cond_exprt" -> "a cond_exprt"

Value-set dereferencing always produces mutually exclusive conditions -- if p points to a, else if it points to b... and so on. By using an exclusive cond_exprt we can avoid creating quadratic series of guards like "p == &a", "p != &a && p == &b", "p != &a && p != &b && p == &c"...

This is analogous to its existing handling of if_exprt on the left, except that for an exclusive cond_exprt there is no need to accumulate increasingly large guards.

This is just like lift_if, but for cond_exprt

This handles all the cases in the simplifier that special-cased if_exprt

There are too many downstream components, especially the string solver, that don't understand them yet.

…andles if_exprt

allredj

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 2e48edd).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/109308178
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

tautschnig

Seems like a good idea, but can we also have some tests that, e.g., look at the output of --program-only or --show-vcc?

tautschnig · 2019-04-23T20:34:26Z

src/pointer-analysis/value_set_dereference.cpp

@@ -107,12 +107,12 @@ exprt value_set_dereferencet::dereference(const exprt &pointer)
        may_fail=true;
  }

+  exprt failure_value;


I find it too non-trivial to check that failure_value is never used uninitialised. See comment below.

tautschnig · 2019-04-23T20:36:00Z

src/pointer-analysis/value_set_dereference.cpp

+  // two however, so purely by historical accident, the failed object takes
+  // precedence:
+
+  if(may_fail || value_without_condition.has_value())


Can't we just combine value_without_condition and failure_value into a single expression?

tautschnig · 2019-04-23T20:38:39Z

src/goto-symex/symex_assign.cpp

+  }
+
+  // Restore the guard to its state before entering this function:
+  INVARIANT(guard.size() >= old_guard_size, "must not shrink the guard!");


s/shrink/grow/?

tautschnig · 2019-04-23T20:47:01Z

src/goto-symex/symex_clean_expr.cpp

+            if_exprt(cond_expr.condition(i), cond_expr.value(i), new_expr);
+        }
+        if(i == 0)
+          break;


I'd use for(std::size_t i = cond_expr.get_n_cases(); i > 0; --i) and then if(i == cond_expr.get_n_cases()) as well as cond_expr.value(i - 1) and cond_expr.condition(i - 1)`.

kroening · 2019-04-24T09:19:05Z

I am a bit worried that the is_exclusive extension of cond_exprt may be a bit surprising/difficult to spot. How about a switch_exprt, inspired by the switch expression in Java 13?

smowton · 2019-04-24T15:11:59Z

I re-tested this on security now that it passes tests and found that all the previously observed performance benefits have evaporated, so presumably I was benefiting from bugs (probably false pointer guards leading to whole assignments getting simplified away), not the actual improvement I thought I was. I observe some performance benefit when cond_exprt is used on the RHS as well as the LHS (but much less than the benefit I originally thought I was seeing), but that crashes the string solver later and would presumably require much more cases to handle cond_exprt, including the string solver and all analyses it relies on.

Notes going forward:

Trying to debug this I am still occasionally seeing guards like p != &o1 && p != &o2 && p == &o3 on the right-hand side, despite goto_symext::assign_if not being used, which I thought was the culprit for introducing these.
Inspecting the case I was looking at more closely, the culprit is an array-invert function, of the form

for(...) { 
  result[i][j] = input[j][i];
}

The aliasing problem on the RHS is much worse than the LHS (192 aliases vs. 20), but significant savings could still probably be made by simplifying and renaming the RHS once prior to symex_assign_if / symex_assign_cond, rather than duplicating it and then renaming/simplifying every time we get down to symex_assign_symbol.

smowton · 2019-04-25T10:59:22Z

Further investigation of point (1) above: the strings of negations result from simplifying an expression of the form x ? false : z into !x && z. Ultimately the simplifier is reflecting that the nested-if tower does mean dependent (non-exclusive) conditions. A moderate performance win can indeed be achieved by using an exclusive switch_exprt, but it must be used on the RHS too. I'll incrementally add support for this over the coming days/weeks.

tautschnig · 2019-05-13T12:35:25Z

@smowton With #4576 merged, is this one still relevant?

smowton added 2 commits April 18, 2019 15:28

Add exclusive flag to cond_exprt

93e4622

This indicates the conditionals are mutually exclusive -- i.e. it is more like a switch expression than an if-elseif-else expression. This is useful as the implied guards are linear in the number of cases, not quadratic.

Add accessors to cond_exprt

1a553e0

smowton requested review from chrisr-diffblue, kroening, martin-cs, owen-mc-diffblue, peterschrammel, pkesseli and tautschnig as code owners April 18, 2019 15:26

smowton changed the title ~~Smowton/feature/value set deref cond expr~~ Value-set dereference: use cond_exprt to avoid quadratic guards Apr 18, 2019

smowton requested review from JohnDumbell and Degiorgio April 18, 2019 15:29

smowton force-pushed the smowton/feature/value-set-deref-cond-expr branch from 6475964 to 887a60b Compare April 23, 2019 09:36

smowton force-pushed the smowton/feature/value-set-deref-cond-expr branch from 887a60b to e001dd3 Compare April 23, 2019 09:57

owen-mc-diffblue approved these changes Apr 23, 2019

View reviewed changes

smowton added 10 commits April 23, 2019 16:01

goto-symex: handle cond_exprt on the LHS

ae870f1

This is analogous to its existing handling of if_exprt on the left, except that for an exclusive cond_exprt there is no need to accumulate increasingly large guards.

Use lift-if more consistently

f57ebf2

Add lift_cond

739e11d

This is just like lift_if, but for cond_exprt

Simplify cond_exprt

3629f2a

This handles all the cases in the simplifier that special-cased if_exprt

Add missing case to bv_pointerst

3645437

Value-set: handle cond_exprt on the RHS

59145f1

Lower cond_expr on the RHS

b56063b

There are too many downstream components, especially the string solver, that don't understand them yet.

process_array_expr: handle ID_cond

2fc837c

Add cases handling cond_exprt wherever symex or value-set currently h…

2e48edd

…andles if_exprt

smowton force-pushed the smowton/feature/value-set-deref-cond-expr branch from e001dd3 to 2e48edd Compare April 23, 2019 15:03

allredj reviewed Apr 23, 2019

View reviewed changes

smowton assigned kroening Apr 23, 2019

smowton assigned tautschnig Apr 23, 2019

tautschnig approved these changes Apr 23, 2019

View reviewed changes

tautschnig assigned smowton and unassigned tautschnig Apr 23, 2019

smowton mentioned this pull request Apr 26, 2019

Use let-expression for dereferenced pointers #4576

Merged

smowton closed this May 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Value-set dereference: use cond_exprt to avoid quadratic guards #4555

Value-set dereference: use cond_exprt to avoid quadratic guards #4555

Uh oh!

smowton commented Apr 18, 2019 •

edited

Loading

Uh oh!

smowton commented Apr 18, 2019

Uh oh!

smowton commented Apr 23, 2019

Uh oh!

owen-mc-diffblue Apr 18, 2019

Uh oh!

owen-mc-diffblue Apr 18, 2019

Uh oh!

allredj left a comment

Uh oh!

tautschnig left a comment

Uh oh!

tautschnig Apr 23, 2019

Uh oh!

tautschnig Apr 23, 2019

Uh oh!

tautschnig Apr 23, 2019

Uh oh!

tautschnig Apr 23, 2019

Uh oh!

kroening commented Apr 24, 2019

Uh oh!

smowton commented Apr 24, 2019

Uh oh!

smowton commented Apr 25, 2019

Uh oh!

tautschnig commented May 13, 2019

Uh oh!

Uh oh!

Value-set dereference: use cond_exprt to avoid quadratic guards #4555

Value-set dereference: use cond_exprt to avoid quadratic guards #4555

Uh oh!

Conversation

smowton commented Apr 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem

The solution

Rejected solutions

Uh oh!

smowton commented Apr 18, 2019

Uh oh!

smowton commented Apr 23, 2019

Uh oh!

owen-mc-diffblue Apr 18, 2019

Choose a reason for hiding this comment

Uh oh!

owen-mc-diffblue Apr 18, 2019

Choose a reason for hiding this comment

Uh oh!

allredj left a comment

Choose a reason for hiding this comment

Uh oh!

tautschnig left a comment

Choose a reason for hiding this comment

Uh oh!

tautschnig Apr 23, 2019

Choose a reason for hiding this comment

Uh oh!

tautschnig Apr 23, 2019

Choose a reason for hiding this comment

Uh oh!

tautschnig Apr 23, 2019

Choose a reason for hiding this comment

Uh oh!

tautschnig Apr 23, 2019

Choose a reason for hiding this comment

Uh oh!

kroening commented Apr 24, 2019

Uh oh!

smowton commented Apr 24, 2019

Uh oh!

smowton commented Apr 25, 2019

Uh oh!

tautschnig commented May 13, 2019

Uh oh!

Uh oh!

smowton commented Apr 18, 2019 •

edited

Loading