Skip to content

Value-set dereference: use cond_exprt to avoid quadratic guards #4555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Apr 18, 2019

The problem

When we have a moderately complicated pointer expression, p, with a large-ish alias set (size n) named o1, o2 ... on, then value_set_dereferencet would create an expression of the form:
p == &o1 ? o1 : p == &o2 ? o2 : p == &o3 : ...

So far so good -- the expression p has been duplicated n times, but irep sharing means that's not actually very costly.

However, if the deref occurs on the LHS, then symex_assign_if expands this into a series of assignments:

o1#2 = p == &o1 ? new_value : o1#1
o2#2 = p != &o1 && p == &o2 ? new_value : o2#1
o3#2 = p != &o1 && p != &o2 && p == &o3 ? new_value : o3#1
...

There are now n^2 copies of p, and thanks to renaming they are no longer shared. The simplifier then proceeds to spend quite a long time on this assignment sequence if n is large-ish and so n^2 is very large.

The solution

The conditions p != &o1 && p != &o2 && p == &o3 are actually redundant, but our nested if_exprt chain doesn't actually make that clear. Therefore I add a flag is_exclusive to the already-existing cond_exprt, which expresses an if-elseif-elseif-... chain in one instruction. This means that we now get

o1#2 = p == &o1 ? new_value : o1#1
o2#2 = p == &o2 ? new_value : o2#1
o3#2 = p == &o3 ? new_value : o3#1

Back to linear complexity on the number of aliases. Much better. 90% of the diff in the PR is teaching various passes how to deal with cond_exprt, which presumably we'd want to do sooner or later; review commit-by-commit.

Note this doesn't implement cond-expr everywhere; particularly the string solver doesn't understand it yet. For now the expression is turned back into nested-ifs before it gets to that module.

Rejected solutions

Use a let-expression: Rather than try to avoid quadratic conditionals we could just surround the whole thing in let p2 = p in .... I think this would work: symex_assign would need to learn to evaluate let-around-if, presumably leading to let p2 = p in p2 != &o1 && p2 != &o2 && p2 == &o3 ..., and the simplifier would have a linear number of p instances to attack and a quadratic number of shallow == nodes. I thought it was better to avoid the quadratic syntax tree node problem at its root.

Use caching in the simplifier: While a quadratic number of comparisons would still occur, the simplifier can check before simplifying if it has seen an identical node before and avoid the repeated work. However this doesn't really deal with the case where p is complicated and doesn't simplify much or at all -- in this case renaming and subsequent passes still have a huge expression to deal with.

Avoid quadratic guards by analysis: when symex comes to add p == &o3 to its stack of guards, it could also remove any that are incompatible with it. This feels brittle to me -- the tests generated by value_set_dereferencet are not always that simple, so we'd have to at least cover the various expressions it creates, and would rely on the guard elimination remaining clever enough if/when value-set-deref produces different expressions. We could try doing this as well, but I'd prefer if value-set-deref was explicit that it intends to produce a disjoint switch.

smowton added 2 commits April 18, 2019 15:28
This indicates the conditionals are mutually exclusive -- i.e. it is more like a switch
expression than an if-elseif-else expression. This is useful as the implied guards are
linear in the number of cases, not quadratic.
@smowton smowton changed the title Smowton/feature/value set deref cond expr Value-set dereference: use cond_exprt to avoid quadratic guards Apr 18, 2019
@smowton
Copy link
Contributor Author

smowton commented Apr 18, 2019

Of course I'll add tests and benchmarking figures if people are happy with the overall approach. Anecdata so far: improves run time for an alias set of size 30 from around a minute to less than a second.

@smowton smowton force-pushed the smowton/feature/value-set-deref-cond-expr branch from 6475964 to 887a60b Compare April 23, 2019 09:36
@smowton
Copy link
Contributor Author

smowton commented Apr 23, 2019

@tautschnig @kroening this now passes tests, and is ready for review

@smowton smowton force-pushed the smowton/feature/value-set-deref-cond-expr branch from 887a60b to e001dd3 Compare April 23, 2019 09:57

for(std::size_t i = 0; i < lhs.get_n_cases(); ++i)
{
exprt renamed_guard = state.rename(lhs.condition(i), ns).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 It's confusing that the variable names don't distinguish between the incoming guard and the extra bit coming from the current case we're considering. Maybe renamed_case_guard?

@@ -74,6 +75,9 @@ bool has_subtype(const typet &, const irep_idt &id, const namespacet &);
/// lift up an if_exprt one level
if_exprt lift_if(const exprt &, std::size_t operand_number);

/// lift up an cond_exprt one level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an cond_exprt" -> "a cond_exprt"

smowton added 10 commits April 23, 2019 16:01
Value-set dereferencing always produces mutually exclusive conditions -- if p points to a,
else if it points to b... and so on. By using an exclusive cond_exprt we can avoid creating
quadratic series of guards like "p == &a", "p != &a && p == &b", "p != &a && p != &b && p == &c"...
This is analogous to its existing handling of if_exprt on the left, except that
for an exclusive cond_exprt there is no need to accumulate increasingly large
guards.
This is just like lift_if, but for cond_exprt
This handles all the cases in the simplifier that special-cased if_exprt
There are too many downstream components, especially the string solver, that don't understand
them yet.
@smowton smowton force-pushed the smowton/feature/value-set-deref-cond-expr branch from e001dd3 to 2e48edd Compare April 23, 2019 15:03
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 2e48edd).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/109308178
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea, but can we also have some tests that, e.g., look at the output of --program-only or --show-vcc?

@@ -107,12 +107,12 @@ exprt value_set_dereferencet::dereference(const exprt &pointer)
may_fail=true;
}

exprt failure_value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it too non-trivial to check that failure_value is never used uninitialised. See comment below.

// two however, so purely by historical accident, the failed object takes
// precedence:

if(may_fail || value_without_condition.has_value())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just combine value_without_condition and failure_value into a single expression?

}

// Restore the guard to its state before entering this function:
INVARIANT(guard.size() >= old_guard_size, "must not shrink the guard!");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/shrink/grow/?

if_exprt(cond_expr.condition(i), cond_expr.value(i), new_expr);
}
if(i == 0)
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use for(std::size_t i = cond_expr.get_n_cases(); i > 0; --i) and then if(i == cond_expr.get_n_cases()) as well as cond_expr.value(i - 1) and cond_expr.condition(i - 1)`.

@tautschnig tautschnig assigned smowton and unassigned tautschnig Apr 23, 2019
@kroening
Copy link
Member

I am a bit worried that the is_exclusive extension of cond_exprt may be a bit surprising/difficult to spot. How about a switch_exprt, inspired by the switch expression in Java 13?

@smowton
Copy link
Contributor Author

smowton commented Apr 24, 2019

I re-tested this on security now that it passes tests and found that all the previously observed performance benefits have evaporated, so presumably I was benefiting from bugs (probably false pointer guards leading to whole assignments getting simplified away), not the actual improvement I thought I was. I observe some performance benefit when cond_exprt is used on the RHS as well as the LHS (but much less than the benefit I originally thought I was seeing), but that crashes the string solver later and would presumably require much more cases to handle cond_exprt, including the string solver and all analyses it relies on.

Notes going forward:

  1. Trying to debug this I am still occasionally seeing guards like p != &o1 && p != &o2 && p == &o3 on the right-hand side, despite goto_symext::assign_if not being used, which I thought was the culprit for introducing these.
  2. Inspecting the case I was looking at more closely, the culprit is an array-invert function, of the form
for(...) { 
  result[i][j] = input[j][i];
}

The aliasing problem on the RHS is much worse than the LHS (192 aliases vs. 20), but significant savings could still probably be made by simplifying and renaming the RHS once prior to symex_assign_if / symex_assign_cond, rather than duplicating it and then renaming/simplifying every time we get down to symex_assign_symbol.

@smowton
Copy link
Contributor Author

smowton commented Apr 25, 2019

Further investigation of point (1) above: the strings of negations result from simplifying an expression of the form x ? false : z into !x && z. Ultimately the simplifier is reflecting that the nested-if tower does mean dependent (non-exclusive) conditions. A moderate performance win can indeed be achieved by using an exclusive switch_exprt, but it must be used on the RHS too. I'll incrementally add support for this over the coming days/weeks.

@tautschnig
Copy link
Collaborator

@smowton With #4576 merged, is this one still relevant?

@smowton smowton closed this May 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants