-
Notifications
You must be signed in to change notification settings - Fork 274
Filter value sets in symex #4288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter value sets in symex #4288
Conversation
ebcbed9
to
588a338
Compare
src/goto-symex/symex_goto.cpp
Outdated
@@ -224,6 +225,18 @@ void goto_symext::symex_goto(statet &state) | |||
|
|||
symex_transition(state, state_pc, backward); | |||
|
|||
if(!new_guard.is_true()) | |||
{ | |||
// try to update the value sets using information contained in which branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little more specific: update our value-set when particular objects are only compatible with one or other branch
src/goto-symex/symex_main.cpp
Outdated
static void | ||
collect_pointer_typed_symbols(const exprt &expr, std::set<symbol_exprt> &output) | ||
{ | ||
const symbol_exprt *symbol_expr = expr_try_dynamic_cast<symbol_exprt>(expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the only user is only interested in a single pointer, could avoid allocating a pointless set by making it find_single_pointer_typed_symbol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a nice way of doing that which also checks that there's only one.
src/goto-symex/symex_main.cpp
Outdated
|
||
// If symbol_expr has already been L2-renamed then we need to strip it back | ||
// to L1-renaming for the purposes of getting its value set | ||
const symbol_exprt symbol_expr_l1 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that work with get_l1_object()
, which returns const ssa_exprt
(i.e. not a reference)? Is it kept alive by lifetime extension because I take a reference to it?
src/goto-symex/symex_main.cpp
Outdated
continue; | ||
} | ||
|
||
object_descriptor_exprt o = to_object_descriptor_expr(possible_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split this block out as object_descriptor_to_expr
or the like
src/goto-symex/symex_main.cpp
Outdated
replace_symbol.insert(symbol_expr, value_for_symbol_expr); | ||
replace_symbol(modified_condition); | ||
|
||
do_simplify(modified_condition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think simplify-before-rename will gain us anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I added this for a reason. When the condition is *p = a
, replace_symbol
makes it *&a = a
, then rename
makes it a = 2
(because rename
doesn't do constant propagation inside addresses), and then the final simplify didn't do anything and we couldn't tell if it was true or false. Perhaps there is a better way to deal with this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a comment to the code to this effect? It does make me wonder whether this isn't just re-implementing dereferencing (i.e., dereference_rec
) with some extra constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was how I first attempted to implement it. I had a class which took the place of symex_dereference_statet
, which returned a given value for a given key and "not found" for everything else. But it was quite a pain to pass it around everywhere, and it seemed simple enough to just replace the symbol directly. Maybe it would have been better to use dereference_rec
instead.
src/goto-symex/symex_main.cpp
Outdated
|
||
if(value_set_for_true_case && modified_condition.is_false()) | ||
{ | ||
value_sett::entryt *entry = const_cast<value_sett::entryt *>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point repeating the get-entry every time round the loop (worth avoiding doing it at all if we never need it though)
src/goto-symex/symex_main.cpp
Outdated
value_sett::entryt *entry = const_cast<value_sett::entryt *>( | ||
value_set_for_true_case->get_entry_for_symbol( | ||
symbol_expr_l1.get_identifier(), ns.follow(symbol_type), "")); | ||
value_set_for_true_case->erase_value_from_entry(*entry, possible_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential optimisation: since these are in step with possible_values
we could maintain parallel iterators -- for each value we'd either do step-erase, erase-step or step-step.
src/pointer-analysis/value_set.cpp
Outdated
@@ -1611,3 +1611,22 @@ exprt value_sett::make_member( | |||
|
|||
return member_expr; | |||
} | |||
|
|||
void value_sett::erase_value_from_entry(entryt &entry, const exprt &value_to_erase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should only be possible to find one value
src/pointer-analysis/value_set.cpp
Outdated
void value_sett::erase_value_from_entry(entryt &entry, const exprt &value_to_erase) | ||
{ | ||
std::vector<object_map_dt::key_type> keys_to_erase; | ||
for(const auto &key_value : entry.object_map.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ugliness suggests using a lower-level interface -- something like iterating over the set, translating the iterator into an exprt (i.e. to_expr), but then using the iterator to erase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is that then we have to loop over entry.object_map.write()
, which breaks sharing even if we aren't going to erase anything
int c1 = *p1; | ||
|
||
int *p2 = ptr_to_a_or_b; | ||
__CPROVER_assume(*p2 != a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to use != 2
(here and below) to make clear it's the value, not the symbol, which is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 💡 are only suggestions and not requests. I think try_filter_value_sets
is too complex and needs to be split. This shouldn't be merged until there is some indication that it improves performances.
src/pointer-analysis/value_set.cpp
Outdated
insert(dest, exprt(ID_unknown, original_type)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change
@@ -447,6 +447,11 @@ class value_sett | |||
exprt &expr, | |||
const namespacet &ns) const; | |||
|
|||
const value_sett::entryt *get_entry_for_symbol( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document this function
@@ -225,6 +225,25 @@ class goto_symext | |||
virtual void symex_dead(statet &); | |||
virtual void symex_other(statet &); | |||
|
|||
/// Try to filter value sets based on whether possible values of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a description to the commit message ("filter value sets at gotos and assumes")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the commit message needs a bit of proof reading, and ideally it should also answer the "Why?" question. It says what the commit does, but not why this is beneficial.
src/goto-symex/symex_main.cpp
Outdated
@@ -133,9 +133,15 @@ void goto_symext::symex_assume(statet &state, const exprt &cond) | |||
else | |||
state.guard.add(simplified_cond); | |||
|
|||
if(state.atomic_section_id!=0 && | |||
state.guard.is_false()) | |||
if(state.atomic_section_id != 0 && state.guard.is_false()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change?
src/goto-symex/symex_main.cpp
Outdated
|
||
for(const exprt &op : expr.operands()) | ||
{ | ||
collect_pointer_typed_symbols(op, output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 you could consider using expr iterators instead of a recursive function, that would avoid the passing around of the set reference.
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If a method is bigger than 50 lines, break it into parts."
src/goto-symex/symex_main.cpp
Outdated
// false, and vice versa. | ||
for(const auto &possible_value : possible_values) | ||
{ | ||
if(!can_cast_expr<object_descriptor_exprt>(possible_value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 could consider iterating over make_range(possible_values).filter([&](..){ return can_cast_expr<object_descriptor_exprt>(...))
which may be a bit clearer
{ | ||
keys_to_erase.emplace_back(key_value.first); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 This loop could be replaced by a make_range(...).filter(...).map(...)
1f655b9
to
a1cc657
Compare
src/pointer-analysis/value_set.cpp
Outdated
if( | ||
expr_type.id() != ID_pointer && expr_type.id() != ID_signedbv && | ||
expr_type.id() != ID_unsignedbv && expr_type.id() != ID_struct && | ||
expr_type.id() != ID_union && expr_type.id() != ID_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is factored out you will need to consider ID_struct_tag
and ID_union_tag
as well, and then use ns.follow_tag
.
src/pointer-analysis/value_set.h
Outdated
@@ -455,6 +455,16 @@ class value_sett | |||
exprt &expr, | |||
const namespacet &ns) const; | |||
|
|||
/// Get the entry for the symbol and suffix | |||
/// \param identifier: The identifier for the symbol | |||
/// \param expr_type: The type of the symbol (should not be a symbol itself) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should or have to make this restriction. Tag types should be ok, you just need to resolve them.
@@ -225,6 +225,25 @@ class goto_symext | |||
virtual void symex_dead(statet &); | |||
virtual void symex_other(statet &); | |||
|
|||
/// Try to filter value sets based on whether possible values of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the commit message needs a bit of proof reading, and ideally it should also answer the "Why?" question. It says what the commit does, but not why this is beneficial.
src/goto-symex/symex_main.cpp
Outdated
@@ -159,7 +167,7 @@ void goto_symext::symex_assume_l2(statet &state, const exprt &cond) | |||
else | |||
state.guard.add(rewritten_cond); | |||
|
|||
if(state.atomic_section_id!=0 && | |||
if(state.atomic_section_id != 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary change.
/// \return If only one unique symbol occurs in \p expr then return it; | ||
/// otherwise return an empty optionalt | ||
static optionalt<symbol_exprt> | ||
find_unique_pointer_typed_symbol(const exprt &expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just use find_symbols(const exprt &, std::set<symbol_exprt> &)
and then filter the result for pointer-typed ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about it. I originally wrote my own recursive function, but several review comments said I should do something more efficient.
src/goto-symex/symex_main.cpp
Outdated
return {}; | ||
} | ||
|
||
object_descriptor_exprt o = to_object_descriptor_expr(value_set_element); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Const reference?
src/goto-symex/symex_main.cpp
Outdated
state.value_set.get_value_set(*symbol_expr, value_set_elements, ns); | ||
|
||
// Try evaluating the condition with the symbol replaced by a pointer to each | ||
// one of its possible values in turn. If that leads to a true from o1 then we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "o1?"
replace_symbol.insert(*symbol_expr, *possible_value); | ||
replace_symbol(modified_condition); | ||
|
||
do_simplify(modified_condition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one necessary? You'll do it again right after the rename call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main::1::c15!0@1#. = 1 | ||
main::1::c16!0@1#. = 1 | ||
-- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead the the blank line you might want to explain why all the above patterns are the right things to test for? Even more so as this test is failing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some comments in main.c
, which you might have found too sparse. I've expanded them, and corrected the last few, which were wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be sufficient to say "See comments in main.c for explanations on why each of the assignments is (not) expected to appear in the output."
On SV-COMP's ReachSafety-ECA I get 3377.76 symex steps per second on develop, and 3179.05 steps per second with this PR. These benchmarks do not make use of pointers in any meaningful way, so this is really just measuring the overhead of the proposed changes. So we'd now need Java benchmarks that demonstrate the better/best case. |
src/goto-symex/symex_main.cpp
Outdated
replace_symbol(modified_condition); | ||
|
||
do_simplify(modified_condition); | ||
modified_condition = state.rename(std::move(modified_condition), ns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this renaming is only done to perform constant propagation? If so, make sure you set state.record_events
to false
beforehand and set it back to its previous value afterwards. See, e.g., symex_assign.cpp
line 227.
b8569f5
to
6627bc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: 6627bc7).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/103249332
Status will be re-evaluated on next push.
Common spurious failures:
-
the cbmc commit has disappeared in the mean time (e.g. in a force-push)
-
the author is not in the list of contributors (e.g. first-time contributors).
-
the compatibility was already broken by an earlier merge.
6627bc7
to
b4f6bd7
Compare
Here is an example of code which is greatly sped up by this PR. You have to run it with
|
Pathological case: public class Test {
public Test next;
public static void main(boolean nondet) {
Test test;
if(nondet) {
test = new Test();
test.next = test;
}
else {
test = new Child();
test.next = test;
}
test.recurse(10);
}
public void recurse(int depth) {
if(depth == 0)
return;
else
next.recurse(depth - 1);
}
}
class Child extends Test {
public void recurse(int depth) {
if(depth == 0)
return;
else
next.recurse(depth - 1);
}
} Develop runtime: 11.36s / VCCs: 4101 (2047 after simplification) |
eb51d47
to
1639d95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 1639d95).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105807067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 35088bf).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105824041
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 40831f5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105838954
This is in preparation for putting try_filter_value_sets after it. As noted in the comment, it is important that it comes after rename.
This is in preparation for calling try_filter_value_sets in apply_goto_condition
This is in preparation for calling try_filter_value_sets inside apply_goto_condition
Make this a separate function so it can be used from other places. No change in behaviour.
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
Store a vector of expressions to erase from each entry, so we only have to get each entry once (and only do that if we actually have anything to erase).
Value-set filtering means some of the tests pass that didn't before. Also add tests that only pass because of local safe pointer analysis, not value-set filtering, so we can tell if that stops working in future.
Test which rely on convergence will not work with path-symex
The aim is to make value_set_dereferencet::build_reference_to accessible from the outside, by making it public and static. This required making several other functions public and static as well. The motivation is to use in value_set_element_to_expr, which is a helper function for goto_symext::try_filter_value_sets.
Move it from pointer_analysis to util. It is not currently used in the cbmc codebase. The only changes were to the names of the guards in the header file and the includes.
The two operands to the plus_exprt don't need to have the same type as each other.
We should try to filter them out
The test without a deref in the this argument works with value-set filtering on its own. The test with a deref in the this argument also requires the use temporary variable for the this argument of virtual method calls (though we only need to do this when a dispatch table is made and when the this argument contains a dereference).
932324f
to
0544470
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: 932324f).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105856175
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 0544470).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/105859547
When we process a goto or an assume instruction, we will sometimes be able to remove some entries from value-sets. This should make symex more efficient.
value_sett::erase_value_from_entry
is inelegant - alternatives suggestions are appreciated.I have not tested whether this improves performance, but I strongly suspect that it will in some cases.
I'll make separate PRs for the first five commits, as they are just clean ups.