Skip to content

Filter value sets in symex #4288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

owen-mc-diffblue
Copy link
Contributor

When we process a goto or an assume instruction, we will sometimes be able to remove some entries from value-sets. This should make symex more efficient.

value_sett::erase_value_from_entry is inelegant - alternatives suggestions are appreciated.

I have not tested whether this improves performance, but I strongly suspect that it will in some cases.

I'll make separate PRs for the first five commits, as they are just clean ups.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@owen-mc-diffblue owen-mc-diffblue force-pushed the owen/symex-filter-value-sets branch 2 times, most recently from ebcbed9 to 588a338 Compare February 27, 2019 17:16
@owen-mc-diffblue owen-mc-diffblue marked this pull request as ready for review February 27, 2019 17:19
@smowton
Copy link
Contributor

smowton commented Feb 27, 2019

Performance result so far, for just the symex phase: around 1.5x performance penalty, so we should probably look to reduce cost where possible. Of course this is for Java, where the key ->@class_identifier optimisation is not yet possible.

perf_out

@@ -224,6 +225,18 @@ void goto_symext::symex_goto(statet &state)

symex_transition(state, state_pc, backward);

if(!new_guard.is_true())
{
// try to update the value sets using information contained in which branch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little more specific: update our value-set when particular objects are only compatible with one or other branch

static void
collect_pointer_typed_symbols(const exprt &expr, std::set<symbol_exprt> &output)
{
const symbol_exprt *symbol_expr = expr_try_dynamic_cast<symbol_exprt>(expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the only user is only interested in a single pointer, could avoid allocating a pointless set by making it find_single_pointer_typed_symbol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a nice way of doing that which also checks that there's only one.


// If symbol_expr has already been L2-renamed then we need to strip it back
// to L1-renaming for the purposes of getting its value set
const symbol_exprt symbol_expr_l1 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a ref

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work with get_l1_object(), which returns const ssa_exprt (i.e. not a reference)? Is it kept alive by lifetime extension because I take a reference to it?

continue;
}

object_descriptor_exprt o = to_object_descriptor_expr(possible_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this block out as object_descriptor_to_expr or the like

replace_symbol.insert(symbol_expr, value_for_symbol_expr);
replace_symbol(modified_condition);

do_simplify(modified_condition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think simplify-before-rename will gain us anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I added this for a reason. When the condition is *p = a, replace_symbol makes it *&a = a, then rename makes it a = 2 (because rename doesn't do constant propagation inside addresses), and then the final simplify didn't do anything and we couldn't tell if it was true or false. Perhaps there is a better way to deal with this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment to the code to this effect? It does make me wonder whether this isn't just re-implementing dereferencing (i.e., dereference_rec) with some extra constraints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was how I first attempted to implement it. I had a class which took the place of symex_dereference_statet, which returned a given value for a given key and "not found" for everything else. But it was quite a pain to pass it around everywhere, and it seemed simple enough to just replace the symbol directly. Maybe it would have been better to use dereference_rec instead.


if(value_set_for_true_case && modified_condition.is_false())
{
value_sett::entryt *entry = const_cast<value_sett::entryt *>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No point repeating the get-entry every time round the loop (worth avoiding doing it at all if we never need it though)

value_sett::entryt *entry = const_cast<value_sett::entryt *>(
value_set_for_true_case->get_entry_for_symbol(
symbol_expr_l1.get_identifier(), ns.follow(symbol_type), ""));
value_set_for_true_case->erase_value_from_entry(*entry, possible_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential optimisation: since these are in step with possible_values we could maintain parallel iterators -- for each value we'd either do step-erase, erase-step or step-step.

@@ -1611,3 +1611,22 @@ exprt value_sett::make_member(

return member_expr;
}

void value_sett::erase_value_from_entry(entryt &entry, const exprt &value_to_erase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should only be possible to find one value

void value_sett::erase_value_from_entry(entryt &entry, const exprt &value_to_erase)
{
std::vector<object_map_dt::key_type> keys_to_erase;
for(const auto &key_value : entry.object_map.read())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ugliness suggests using a lower-level interface -- something like iterating over the set, translating the iterator into an exprt (i.e. to_expr), but then using the iterator to erase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is that then we have to loop over entry.object_map.write(), which breaks sharing even if we aren't going to erase anything

int c1 = *p1;

int *p2 = ptr_to_a_or_b;
__CPROVER_assume(*p2 != a);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use != 2 (here and below) to make clear it's the value, not the symbol, which is important.

Copy link
Contributor

@romainbrenguier romainbrenguier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 💡 are only suggestions and not requests. I think try_filter_value_sets is too complex and needs to be split. This shouldn't be merged until there is some indication that it improves performances.

insert(dest, exprt(ID_unknown, original_type));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change

@@ -447,6 +447,11 @@ class value_sett
exprt &expr,
const namespacet &ns) const;

const value_sett::entryt *get_entry_for_symbol(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document this function

@@ -225,6 +225,25 @@ class goto_symext
virtual void symex_dead(statet &);
virtual void symex_other(statet &);

/// Try to filter value sets based on whether possible values of a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a description to the commit message ("filter value sets at gotos and assumes")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the commit message needs a bit of proof reading, and ideally it should also answer the "Why?" question. It says what the commit does, but not why this is beneficial.

@@ -133,9 +133,15 @@ void goto_symext::symex_assume(statet &state, const exprt &cond)
else
state.guard.add(simplified_cond);

if(state.atomic_section_id!=0 &&
state.guard.is_false())
if(state.atomic_section_id != 0 && state.guard.is_false())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change?


for(const exprt &op : expr.operands())
{
collect_pointer_typed_symbols(op, output);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 you could consider using expr iterators instead of a recursive function, that would avoid the passing around of the set reference.

}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If a method is bigger than 50 lines, break it into parts."

// false, and vice versa.
for(const auto &possible_value : possible_values)
{
if(!can_cast_expr<object_descriptor_exprt>(possible_value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 could consider iterating over make_range(possible_values).filter([&](..){ return can_cast_expr<object_descriptor_exprt>(...)) which may be a bit clearer

{
keys_to_erase.emplace_back(key_value.first);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 This loop could be replaced by a make_range(...).filter(...).map(...)

@owen-mc-diffblue owen-mc-diffblue force-pushed the owen/symex-filter-value-sets branch 5 times, most recently from 1f655b9 to a1cc657 Compare March 4, 2019 17:16
if(
expr_type.id() != ID_pointer && expr_type.id() != ID_signedbv &&
expr_type.id() != ID_unsignedbv && expr_type.id() != ID_struct &&
expr_type.id() != ID_union && expr_type.id() != ID_array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is factored out you will need to consider ID_struct_tag and ID_union_tag as well, and then use ns.follow_tag.

@@ -455,6 +455,16 @@ class value_sett
exprt &expr,
const namespacet &ns) const;

/// Get the entry for the symbol and suffix
/// \param identifier: The identifier for the symbol
/// \param expr_type: The type of the symbol (should not be a symbol itself)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should or have to make this restriction. Tag types should be ok, you just need to resolve them.

@@ -225,6 +225,25 @@ class goto_symext
virtual void symex_dead(statet &);
virtual void symex_other(statet &);

/// Try to filter value sets based on whether possible values of a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the commit message needs a bit of proof reading, and ideally it should also answer the "Why?" question. It says what the commit does, but not why this is beneficial.

@@ -159,7 +167,7 @@ void goto_symext::symex_assume_l2(statet &state, const exprt &cond)
else
state.guard.add(rewritten_cond);

if(state.atomic_section_id!=0 &&
if(state.atomic_section_id != 0 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change.

/// \return If only one unique symbol occurs in \p expr then return it;
/// otherwise return an empty optionalt
static optionalt<symbol_exprt>
find_unique_pointer_typed_symbol(const exprt &expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just use find_symbols(const exprt &, std::set<symbol_exprt> &) and then filter the result for pointer-typed ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about it. I originally wrote my own recursive function, but several review comments said I should do something more efficient.

return {};
}

object_descriptor_exprt o = to_object_descriptor_expr(value_set_element);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Const reference?

state.value_set.get_value_set(*symbol_expr, value_set_elements, ns);

// Try evaluating the condition with the symbol replaced by a pointer to each
// one of its possible values in turn. If that leads to a true from o1 then we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "o1?"

replace_symbol.insert(*symbol_expr, *possible_value);
replace_symbol(modified_condition);

do_simplify(modified_condition);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one necessary? You'll do it again right after the rename call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main::1::c15!0@1#. = 1
main::1::c16!0@1#. = 1
--

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead the the blank line you might want to explain why all the above patterns are the right things to test for? Even more so as this test is failing...

Copy link
Contributor Author

@owen-mc-diffblue owen-mc-diffblue Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some comments in main.c, which you might have found too sparse. I've expanded them, and corrected the last few, which were wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be sufficient to say "See comments in main.c for explanations on why each of the assignments is (not) expected to appear in the output."

@tautschnig
Copy link
Collaborator

On SV-COMP's ReachSafety-ECA I get 3377.76 symex steps per second on develop, and 3179.05 steps per second with this PR. These benchmarks do not make use of pointers in any meaningful way, so this is really just measuring the overhead of the proposed changes. So we'd now need Java benchmarks that demonstrate the better/best case.

replace_symbol(modified_condition);

do_simplify(modified_condition);
modified_condition = state.rename(std::move(modified_condition), ns);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this renaming is only done to perform constant propagation? If so, make sure you set state.record_events to false beforehand and set it back to its previous value afterwards. See, e.g., symex_assign.cpp line 227.

@owen-mc-diffblue owen-mc-diffblue force-pushed the owen/symex-filter-value-sets branch 6 times, most recently from b8569f5 to 6627bc7 Compare March 5, 2019 17:28
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 6627bc7).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/103249332
Status will be re-evaluated on next push.
Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)

  • the author is not in the list of contributors (e.g. first-time contributors).

  • the compatibility was already broken by an earlier merge.

@owen-mc-diffblue owen-mc-diffblue force-pushed the owen/symex-filter-value-sets branch from 6627bc7 to b4f6bd7 Compare March 6, 2019 15:26
@owen-mc-diffblue
Copy link
Contributor Author

owen-mc-diffblue commented Mar 6, 2019

Here is an example of code which is greatly sped up by this PR. You have to run it with --unwind 10000. On my computer it takes 1.9s with this PR and 51.0s with develop (and it balloons as you increase the number of iterations).

#include <assert.h>

typedef struct {
  int a;
  char b;
} my_struct;

int main(int argc, char **argv)
{
  my_struct s0 = {-1, 'n'};
  my_struct s1 = {1, 'y'};

  __CPROVER_assume(argc == 1);
  my_struct *ptr = argv[0][0] == '0' ? &s0 : &s1;

  int sum = 0;
  int n_iters = 2500;

  for(int i = 0; i < n_iters; ++i)
  {
    if(ptr->b == 'y')
    {
      sum += ptr->a;
    }
  }

  assert(sum == n_iters || sum == 0);
}

@smowton
Copy link
Contributor

smowton commented Mar 25, 2019

Pathological case:

public class Test {

    public Test next;
    
    public static void main(boolean nondet) {
	Test test;
	if(nondet) {
	    test = new Test();
	    test.next = test;
	}
	else {
	    test = new Child();
	    test.next = test;
	}

	test.recurse(10);	
    }

    public void recurse(int depth) {
	if(depth == 0)
	    return;
	else
	    next.recurse(depth - 1);
    }

}

class Child extends Test {

    public void recurse(int depth) {
	if(depth == 0)
	    return;
	else
	    next.recurse(depth - 1);
    }

}

Develop runtime: 11.36s / VCCs: 4101 (2047 after simplification)
Branch runtime: 0.03s / VCCs: 49 (3 after simplification)

@owen-mc-diffblue owen-mc-diffblue force-pushed the owen/symex-filter-value-sets branch 2 times, most recently from eb51d47 to 1639d95 Compare March 26, 2019 08:08
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 1639d95).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105807067

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 35088bf).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105824041

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 40831f5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105838954

Owen and others added 18 commits March 26, 2019 14:06
This is in preparation for putting try_filter_value_sets after it.
As noted in the comment, it is important that it comes after rename.
This is in preparation for calling try_filter_value_sets
in apply_goto_condition
This is in preparation for calling try_filter_value_sets inside
apply_goto_condition
Make this a separate function so it can be used from other places.
No change in behaviour.
When a conditional change of control flow in goto-symex
(i.e. gotos and assumes), if there is a pointer-typed symbol
in the condition then try to filter its value set separately
for each branch based on which values are actually possible
on that branch. We only do this when there is only one
pointer-typed symbol in the condition.

The intention is to make the value-sets more accurate. In
particular, this lays the groundwork for dealing with
virtual method dispatch much more efficiently. See diffblue#2122
for an approach that was rejected. For example in java,
code like `x.equals(y)`, where `x` could point to an Object
or a String, gets turned into code like
```
if (x.@class_identifier == 'java.lang.Object')
    x . java.lang.Object.equals(y)
else if (x.@class_identifier == 'java.lang.String')
    x . java.lang.String.equals(y)
```
When symex goes into java.lang.String.equals it doesn't
filter the value-set so that `this` (which is `x`) only
points to the String, not the Object. This causes symex to
add complicated expressions to the formula for field accesses
to fields that x won't have if it points to an Object.
Store a vector of expressions to erase from each entry, so we only
have to get each entry once (and only do that if we actually have
anything to erase).
Value-set filtering means some of the tests pass that didn't before.
Also add tests that only pass because of local safe pointer analysis,
not value-set filtering, so we can tell if that stops working in future.
Test which rely on convergence will not work with path-symex
The aim is to make value_set_dereferencet::build_reference_to accessible
from the outside, by making it public and static. This required making
several other functions public and static as well.

The motivation is to use in value_set_element_to_expr, which is a helper
function for goto_symext::try_filter_value_sets.
Move it from pointer_analysis to util. It is not currently used in the
cbmc codebase. The only changes were to the names of the guards in the
header file and the includes.
The two operands to the plus_exprt don't need to have the same type
as each other.
We should try to filter them out
The test without a deref in the this argument works with
value-set filtering on its own. The test with a deref in the
this argument also requires the use temporary variable for
the this argument of virtual method calls (though we only
need to do this when a dispatch table is made and when the
this argument contains a dereference).
@owen-mc-diffblue owen-mc-diffblue force-pushed the owen/symex-filter-value-sets branch 2 times, most recently from 932324f to 0544470 Compare March 26, 2019 14:17
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 932324f).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105856175
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 0544470).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105859547

@owen-mc-diffblue owen-mc-diffblue merged commit 08dfd8d into diffblue:develop Mar 26, 2019
@owen-mc-diffblue owen-mc-diffblue deleted the owen/symex-filter-value-sets branch March 26, 2019 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants