Skip to content

Commit 8efba21

Browse files
author
Owen
committed
filter value sets at gotos and assumes
When a conditional change of control flow in goto-symex (i.e. gotos and assumes), if there is a pointer-typed symbol in the condition then try to filter its value set separately for each branch based on which values are actually possible on that branch. We only do this when there is only one pointer-typed symbol in the condition. The intention is to make the value-sets more accurate. In particular, this lays the groundwork for dealing with virtual method dispatch much more efficiently. See diffblue#2122 for an approach that was rejected. For example in java, code like `x.equals(y)`, where `x` could point to an Object or a String, gets turned into code like ``` if (x.@class_identifier == 'java.lang.Object') x . java.lang.Object.equals(y) else if (x.@class_identifier == 'java.lang.String') x . java.lang.String.equals(y) ``` When symex goes into java.lang.String.equals it doesn't filter the value-set so that `this` (which is `x`) only points to the String, not the Object. This causes symex to add complicated expressions to the formula for field accesses to fields that x won't have if it points to an Object.
1 parent f57bbd8 commit 8efba21

File tree

5 files changed

+239
-0
lines changed

5 files changed

+239
-0
lines changed

src/goto-symex/goto_symex.h

+23
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,29 @@ class goto_symext
333333
const exprt &new_guard,
334334
const namespacet &ns);
335335

336+
/// Try to filter value sets based on whether possible values of a
337+
/// pointer-typed symbol make the condition true or false. We only do this
338+
/// when there is only one pointer-typed symbol in \p condition.
339+
/// \param state: The current state
340+
/// \param condition: The condition which is being evaluated, which it expects
341+
/// will not have been cleaned or renamed. In practice, it's fine if it has
342+
/// been cleaned and renamed up to level L1.
343+
/// \param original_value_set: The value set we will read from
344+
/// \param jump_taken_value_set: The value set that will be used when the
345+
/// condition is true, so we remove any elements which we can tell will
346+
/// make the condition false, or nullptr if this shouldn't be done
347+
/// \param jump_not_taken_value_set: The value set that will be used when the
348+
/// condition is false, so we remove any elements which we can tell will
349+
/// make the condition true, or nullptr if this shouldn't be done
350+
/// \param ns: A namespace
351+
void try_filter_value_sets(
352+
goto_symex_statet &state,
353+
exprt condition,
354+
const value_sett &original_value_set,
355+
value_sett *jump_taken_value_set,
356+
value_sett *jump_not_taken_value_set,
357+
const namespacet &ns);
358+
336359
virtual void vcc(
337360
const exprt &,
338361
const std::string &msg,

src/goto-symex/symex_goto.cpp

+21
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,27 @@ void goto_symext::apply_goto_condition(
3030
const exprt &new_guard,
3131
const namespacet &ns)
3232
{
33+
// It would be better to call try_filter_value_sets after apply_condition,
34+
// and pass nullptr for value sets which apply_condition successfully updated
35+
// already. However, try_filter_value_sets calls rename to do constant
36+
// propagation, and apply_condition can update the constant propagator. Since
37+
// apply condition will never succeed on both jump_taken_state and
38+
// jump_not_taken_state, it should be possible to pass a reference to an
39+
// unmodified goto_statet to use for renaming. But renaming needs a
40+
// goto_symex_statet, not just a goto_statet, and we only have access to one
41+
// of those. A future improvement would be to split rename into two parts:
42+
// one part on goto_symex_statet which is non-const and deals with
43+
// l2 thread reads, and one part on goto_statet which is const and could be
44+
// used in try_filter_value_sets.
45+
46+
try_filter_value_sets(
47+
current_state,
48+
original_guard,
49+
jump_taken_state.value_set,
50+
&jump_taken_state.value_set,
51+
&jump_not_taken_state.value_set,
52+
ns);
53+
3354
jump_taken_state.apply_condition(new_guard, current_state, ns);
3455

3556
// Try to find a negative condition that implies an equality constraint on

src/goto-symex/symex_main.cpp

+171
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Author: Daniel Kroening, [email protected]
1515
#include <memory>
1616

1717
#include <util/exception_utils.h>
18+
#include <util/expr_iterator.h>
1819
#include <util/expr_util.h>
1920
#include <util/make_unique.h>
2021
#include <util/mathematical_expr.h>
@@ -130,6 +131,22 @@ void goto_symext::symex_assume(statet &state, const exprt &cond)
130131
simplified_cond = state.rename(std::move(simplified_cond), ns).get();
131132
do_simplify(simplified_cond);
132133

134+
// It would be better to call try_filter_value_sets after apply_condition,
135+
// and pass nullptr for value sets which apply_condition successfully updated
136+
// already. However, try_filter_value_sets calls rename to do constant
137+
// propagation, and apply_condition can update the constant propagator. Since
138+
// apply condition will never succeed on both jump_taken_state and
139+
// jump_not_taken_state, it should be possible to pass a reference to an
140+
// unmodified goto_statet to use for renaming. But renaming needs a
141+
// goto_symex_statet, not just a goto_statet, and we only have access to one
142+
// of those. A future improvement would be to split rename into two parts:
143+
// one part on goto_symex_statet which is non-const and deals with
144+
// l2 thread reads, and one part on goto_statet which is const and could be
145+
// used in try_filter_value_sets.
146+
147+
try_filter_value_sets(
148+
state, cond, state.value_set, &state.value_set, nullptr, ns);
149+
133150
// apply_condition must come after rename because it might change the
134151
// constant propagator and the value-set and we read from those in rename
135152
state.apply_condition(simplified_cond, state, ns);
@@ -543,3 +560,157 @@ void goto_symext::symex_step(
543560
UNREACHABLE;
544561
}
545562
}
563+
564+
/// Check if an expression only contains one unique symbol (possibly repeated
565+
/// multiple times)
566+
/// \param expr: The expression to examine
567+
/// \return If only one unique symbol occurs in \p expr then return it;
568+
/// otherwise return an empty optionalt
569+
static optionalt<symbol_exprt>
570+
find_unique_pointer_typed_symbol(const exprt &expr)
571+
{
572+
optionalt<symbol_exprt> return_value;
573+
for(auto it = expr.depth_cbegin(); it != expr.depth_cend(); ++it)
574+
{
575+
const symbol_exprt *symbol_expr = expr_try_dynamic_cast<symbol_exprt>(*it);
576+
if(symbol_expr && can_cast_type<pointer_typet>(symbol_expr->type()))
577+
{
578+
// If we already have a potential return value, check if it is the same
579+
// symbol, and return an empty optionalt if not
580+
if(return_value && *symbol_expr != *return_value)
581+
{
582+
return {};
583+
}
584+
return_value = *symbol_expr;
585+
}
586+
}
587+
588+
// Either expr contains no pointer-typed symbols or it contains one unique
589+
// pointer-typed symbol, possibly repeated multiple times
590+
return return_value;
591+
}
592+
593+
/// This is a simplified version of value_set_dereferencet::build_reference_to.
594+
/// It ignores the ID_dynamic_object case (which doesn't occur in goto-symex)
595+
/// and gives up for integer addresses and non-trivial symbols
596+
/// \param value_set_element: An element of a value-set
597+
/// \param type: the type of the expression that might point to
598+
/// \p value_set_element
599+
/// \return An expression for the value of the pointer indicated by \p
600+
/// value_set_element if it is easy to determine, or an empty optionalt
601+
/// otherwise
602+
static optionalt<exprt>
603+
value_set_element_to_expr(exprt value_set_element, pointer_typet type)
604+
{
605+
const object_descriptor_exprt *object_descriptor =
606+
expr_try_dynamic_cast<object_descriptor_exprt>(value_set_element);
607+
if(!object_descriptor)
608+
{
609+
return {};
610+
}
611+
612+
const exprt &root_object = object_descriptor->root_object();
613+
const exprt &object = object_descriptor->object();
614+
615+
if(root_object.id() == ID_null_object)
616+
{
617+
return null_pointer_exprt{type};
618+
}
619+
else if(root_object.id() == ID_integer_address)
620+
{
621+
return {};
622+
}
623+
else
624+
{
625+
// We should do something like
626+
// value_set_dereference::dereference_type_compare, which deals with
627+
// arrays having types containing void
628+
if(object_descriptor->offset().is_zero() && object.type() == type.subtype())
629+
{
630+
return address_of_exprt(object);
631+
}
632+
else
633+
{
634+
return {};
635+
}
636+
}
637+
}
638+
639+
void goto_symext::try_filter_value_sets(
640+
goto_symex_statet &state,
641+
exprt condition,
642+
const value_sett &original_value_set,
643+
value_sett *jump_taken_value_set,
644+
value_sett *jump_not_taken_value_set,
645+
const namespacet &ns)
646+
{
647+
condition = state.rename<L1>(std::move(condition), ns).get();
648+
649+
optionalt<symbol_exprt> symbol_expr =
650+
find_unique_pointer_typed_symbol(condition);
651+
652+
if(!symbol_expr)
653+
{
654+
return;
655+
}
656+
657+
const pointer_typet &symbol_type = to_pointer_type(symbol_expr->type());
658+
659+
value_setst::valuest value_set_elements;
660+
original_value_set.get_value_set(*symbol_expr, value_set_elements, ns);
661+
662+
// Try evaluating the condition with the symbol replaced by a pointer to each
663+
// one of its possible values in turn. If that leads to a true for some
664+
// value_set_element then we can delete it from the value set that will be
665+
// used if the condition is false, and vice versa.
666+
for(const auto &value_set_element : value_set_elements)
667+
{
668+
optionalt<exprt> possible_value =
669+
value_set_element_to_expr(value_set_element, symbol_type);
670+
671+
if(!possible_value)
672+
{
673+
continue;
674+
}
675+
676+
exprt modified_condition(condition);
677+
678+
address_of_aware_replace_symbolt replace_symbol{};
679+
replace_symbol.insert(*symbol_expr, *possible_value);
680+
replace_symbol(modified_condition);
681+
682+
// This do_simplify() is needed for the following reason: if condition is
683+
// `*p == a` and we replace `p` with `&a` then we get `*&a == a`. Suppose
684+
// our constant propagation knows that `a` is `. Without this call to
685+
// do_simplify(), state.rename() turns this into `*&a == 1` (because
686+
// rename() doesn't do constant propagation inside addresses), which
687+
// do_simplify() turns into `a == 1`, which cannot be evaluated as true
688+
// without another round of constant propagation.
689+
// It would be sufficient to replace this call to do_simplify() with
690+
// something that just replaces `*&x` with `x` whenever it finds it.
691+
do_simplify(modified_condition);
692+
693+
const bool record_events = state.record_events;
694+
state.record_events = false;
695+
modified_condition = state.rename(std::move(modified_condition), ns).get();
696+
state.record_events = record_events;
697+
698+
do_simplify(modified_condition);
699+
700+
if(jump_taken_value_set && modified_condition.is_false())
701+
{
702+
value_sett::entryt *entry = const_cast<value_sett::entryt *>(
703+
jump_taken_value_set->get_entry_for_symbol(
704+
symbol_expr->get_identifier(), symbol_type, "", ns));
705+
jump_taken_value_set->erase_value_from_entry(*entry, value_set_element);
706+
}
707+
else if(jump_not_taken_value_set && modified_condition.is_true())
708+
{
709+
value_sett::entryt *entry = const_cast<value_sett::entryt *>(
710+
jump_not_taken_value_set->get_entry_for_symbol(
711+
symbol_expr->get_identifier(), symbol_type, "", ns));
712+
jump_not_taken_value_set->erase_value_from_entry(
713+
*entry, value_set_element);
714+
}
715+
}
716+
}

src/pointer-analysis/value_set.cpp

+22
Original file line numberDiff line numberDiff line change
@@ -1615,3 +1615,25 @@ exprt value_sett::make_member(
16151615

16161616
return member_expr;
16171617
}
1618+
1619+
void value_sett::erase_value_from_entry(
1620+
entryt &entry,
1621+
const exprt &value_to_erase)
1622+
{
1623+
std::vector<object_map_dt::key_type> keys_to_erase;
1624+
1625+
for(const auto &key_value : entry.object_map.read())
1626+
{
1627+
const auto &rhs_object = to_expr(key_value);
1628+
if(rhs_object == value_to_erase)
1629+
{
1630+
keys_to_erase.emplace_back(key_value.first);
1631+
}
1632+
}
1633+
1634+
DATA_INVARIANT(
1635+
keys_to_erase.size() == 1,
1636+
"value_sett::erase_value_from_entry() should erase exactly one value");
1637+
1638+
entry.object_map.write().erase(keys_to_erase[0]);
1639+
}

src/pointer-analysis/value_set.h

+2
Original file line numberDiff line numberDiff line change
@@ -473,6 +473,8 @@ class value_sett
473473
const std::string &suffix,
474474
const namespacet &ns) const;
475475

476+
void erase_value_from_entry(entryt &entry, const exprt &value_to_erase);
477+
476478
protected:
477479
/// Reads the set of objects pointed to by `expr`, including making
478480
/// recursive lookups for dereference operations etc.

0 commit comments

Comments
 (0)