Skip to content

Symex: resolve pointer comparisons using the value-set #4444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

owen-mc-diffblue
Copy link
Contributor

@owen-mc-diffblue owen-mc-diffblue commented Mar 27, 2019

This allows us to evaluate boolean expressions (including GOTO conditions) that test pointer equality or inequality (particularly "ptr == null" or "!= null") where the value-set contains sufficient information. We can already handle simple cases where ptr has a single known alias via constant propagation and aliasing, but not cases where ptr is non-constant, for example because it is drawn from an array.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@@ -287,6 +287,10 @@ class goto_symext
statet &,
bool keep_array);

exprt try_evaluate_pointer_comparisons(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions for a better name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one!

@@ -333,6 +337,11 @@ class goto_symext
const exprt &new_guard,
const namespacet &ns);

optionalt<exprt> build_reference_to_value_set_element(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function might fit better somewhere else - perhaps in value_set_dereferencet, since it's mainly a wrapper around value_set_dereferencet::build_reference_to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proooobably. No strong opinion.

op = try_evaluate_pointer_comparisons(std::move(op), value_set);
}
}
else if(condition.id() == ID_not)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything else I should be recursing over, other than AND, OR and NOT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think that's enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, how about the arguments of a boolean-typed if_exprt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(e.g. x == y ? z == w : a == b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or indeed the first arg of an if_exprt in any context. Total recursion looking for == and != nodes might be the simplest thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 100% think these comments through before beginning to write FYI

{
symbol_exprt *symbol_expr =
expr_try_dynamic_cast<symbol_exprt>(condition.op0());
exprt &constant = condition.op1();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it assumes the symbol is on the lhs and the constant is on the rhs. I will fix that up.

@@ -583,6 +583,38 @@ find_unique_pointer_typed_symbol(const exprt &expr)
return return_value;
}

optionalt<exprt> goto_symext::build_reference_to_value_set_element(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was in goto_symext::try_filter_value_sets and I've pulled it out into a function to reuse it. Perhaps I'll make that a separate commit.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except I believe the null case is broken, and at least needs tests.

possibly_reachable_12 = 7;

int unconditionally_reachable_13;
if((ptr_to_a == &a && c == 3) || c == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bear in mind these constant comparisons with c may well be resolved very early as c is only assigned once and never address-taken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed them to other pointer comparisons. And I now take the address of all the integers at the beginning, if that's a thing that symex cares about.

expr_try_dynamic_cast<symbol_exprt>(condition.op0());
exprt &constant = condition.op1();

if(symbol_expr && goto_symex_is_constantt()(constant))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle the reverse case too (constant OP symbol)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and test this)

if(symbol_expr && goto_symex_is_constantt()(constant))
{
value_setst::valuest value_set_elements;
value_set.get_value_set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that symbol_expr is already L1 at least

to_ssa_expr(*symbol_expr).get_l1_object(), value_set_elements, ns);

INVARIANT(
constant.id() == ID_address_of || constant.id() == ID_NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprising! Null is usually represented as an ID_constant with value = NULL -- suspect you need to add tests for this


if(!unknown_or_invalid_found)
{
if(!constant_found)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give some brief comments setting out the logic here, as !constant_found and value_set_elements.size() == 1 don't immediately chime with "x in { a, b, c }" and "x in { x }" respectively.

@owen-mc-diffblue owen-mc-diffblue force-pushed the value-set-filtering-infeasible-branch branch from 618b074 to 1081b00 Compare March 28, 2019 16:23
@owen-mc-diffblue owen-mc-diffblue marked this pull request as ready for review March 28, 2019 16:25
@owen-mc-diffblue
Copy link
Contributor Author

owen-mc-diffblue commented Mar 28, 2019

I've addressed all of @smowton's review comments on the draft PR and got CI working (touch wood).

@owen-mc-diffblue owen-mc-diffblue force-pushed the value-set-filtering-infeasible-branch branch from 1081b00 to eb55978 Compare March 28, 2019 16:28

// Equality "=="

// A non-null pointer and the correct value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct -> matching and incorrect -> non-matching everywhere


// Disequality "!="

// A non-null pointer and the correct value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the -> a

test::1::unreachable_12[^\s]+ = 7$
^warning: ignoring
--
Pointer comparisons will be resolved in symex by a mixture of constant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding conditions that check you check for everything you should, by making sure unreachable_1[3-9]) and unreachable[2-9][0-9] for example don't occur anywhere (i.e., if someone added a test but not a cross-check this regex would complain)


INVARIANT(
other_operand.id() == ID_address_of ||
(typecast_expr && typecast_expr->op().id() == ID_address_of) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not sure this is simplified you might get redundant stacked typecast ops

expr_try_dynamic_cast<symbol_exprt>(symbol_expr);

if(!symbol_expr_lhs)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for braces with one-liner ifs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use them in all cases

/// \param value_set: The value-set for looking up what the symbol can point to
/// \param language_mode: The language mode
/// \param ns: A namespace
static bool try_evaluate_pointer_comparisons_base_case_with_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// \param value_set: The value-set for looking up what the symbol can point to
/// \param language_mode: The language mode
/// \param ns: A namespace
static void try_evaluate_pointer_comparisons_base_case(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about try_evaluate_pointer_comparison, singular?

/// \param value_set: The value-set for looking up what the symbol can point to
/// \param language_mode: The language mode
/// \param ns: A namespace
static bool try_evaluate_pointer_comparisons_base_case_with_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much gained by the two functions (with_check vs. without) since they're both only called in one place

@owen-mc-diffblue
Copy link
Contributor Author

Note also that the last two commits break sharing everywhere, so we'll have to rethink how to do that.

@smowton
Copy link
Contributor

smowton commented Mar 28, 2019

Suggested change: smowton@1daa2c6

That removes renamed_vectort and instead uses depth_begin/depth_end, which avoid sharing breaks when no changes are made, to selectively alter the renamed expression, only permitting changes where the renaming level is consistent. @romainbrenguier are you happy with this?

@owen-jones-diffblue note that also applies a couple of my suggestions above since it made it easier to code :)

@romainbrenguier
Copy link
Contributor

@smowton

Suggested change: smowton/cbmc@1daa2c6
That removes renamed_vectort and instead uses depth_begin/depth_end, which avoid sharing breaks when no changes are made, to selectively alter the renamed expression, only permitting changes where the renaming level is consistent. @romainbrenguier are you happy with this?

Yes that looks better. What I'm not sure about is the fact that the mutator function takes an exprt as argument rather that a renamedt. That way you avoid calling constructors on all intermediary expressions, but I don't think it's nice for the interface. Maybe we could make renamedt privately inherit from exprt so that we can use static_cast instead of a constructor, so that conversion would be cheap and the interface nicer?
Here is the change I propose (if you want to use it): #4454

@owen-mc-diffblue owen-mc-diffblue force-pushed the value-set-filtering-infeasible-branch branch from eb55978 to b20167c Compare March 29, 2019 07:31
@owen-mc-diffblue
Copy link
Contributor Author

Thanks @smowton, that's much better. I've cherry-picked it and squashed it in, along with fixes for all the things you mentioned. I did add an extra commit which makes selectively_mutate a free function (which is a friend of renamedt), since it doesn't make sense to be using depth iterators for anything when underlyingt is derived from exprt. I'm not sure how this interacts with @romainbrenguier's suggestion.

There are three failing tests. The first is exception_cleanup, which is failing because symex can now tell that @inflight_exception is not null, so the test needs to be changed. I'm not totally sure what it should now be. The failing lines are:

1 remaining after simplification$

in test.desc and

file Test\.java line 6 function java::Test.main:\(I\)V
no uncaught exception

in vccs.desc.

The other two are nondet_elements_longer_lists and nondet_elements_longer_lists_global, which are failing in the same way. I'm looking into that at the moment.

@owen-mc-diffblue
Copy link
Contributor Author

For nondet_elements_longer_lists and nondet_elements_longer_lists_global, the test is failing to find these two lines: ```
[test_function.unwind.\d+] line \d+ unwinding assertion loop 0: SUCCESS
[test_function.unwind.\d+] line \d+ unwinding assertion loop 1: SUCCESS

Is there a good reason that we would not generate these unwinding assertions? If symex can determine that they are true then does it just not output them in the formula?

@smowton
Copy link
Contributor

smowton commented Mar 29, 2019

@owen-jones-diffblue unsure about the unwinding assertions (yet) but the exception one is entirely desirable -- that 1 VCC remaining was a shortcoming which we have now fixed (symex can tell no exception escaped from the whole program). Just amend the test to check all conditions are symex'd away.

@smowton
Copy link
Contributor

smowton commented Mar 29, 2019

@romainbrenguier I'm not sure I follow, what would your ideal interface look like?

@tautschnig
Copy link
Collaborator

@owen-jones-diffblue Presumably your changes just result in more constant propagation, and thus those unwinding assertions are resolved by the simplifier already. #2574 does the same, i.e., removes those lines from the test spec.

@romainbrenguier
Copy link
Contributor

@smowton

@romainbrenguier I'm not sure I follow, what would your ideal interface look like?

I would think having mutator_functiont take renamedt arguments would be more useful (though in your case it doesn't really matter). For instance imagine we want to selectively apply simplify to operands of a renamedt, you could do it with a renamedt -> optinalt<renamedt> function. I'm not insisting on it, since it is not necessary here, but could be nice in the future.

@smowton
Copy link
Contributor

smowton commented Mar 29, 2019

Ah sure, ok. On the one hand the tag is useful if you want to call a function that demands a renamedt, but I certainly like not having to make a dedicated object for every subexpression, so your privately-inherit trick looks like a good solution.

@owen-mc-diffblue owen-mc-diffblue force-pushed the value-set-filtering-infeasible-branch branch from b20167c to 44c25df Compare March 29, 2019 11:01
@owen-mc-diffblue
Copy link
Contributor Author

I've marked this as do not merge until we have some benchmarking data for it

@owen-mc-diffblue
Copy link
Contributor Author

I've cherry-picked ##4454 and added a commit to use it. Before merging this should be cleaned up.

@smowton
Copy link
Contributor

smowton commented May 13, 2019

cbmc CORE unit tests times: unchanged

@smowton smowton closed this May 13, 2019
@smowton smowton reopened this May 13, 2019
@smowton
Copy link
Contributor

smowton commented May 13, 2019

Large Webgoat example: 55 seconds to reach a particular checkpoint (down 5 seconds)

@smowton smowton force-pushed the value-set-filtering-infeasible-branch branch from 96d2f30 to 13b6290 Compare May 13, 2019 10:51
@smowton
Copy link
Contributor

smowton commented May 13, 2019

Romain's Sakai benchmark: One benchmark reduced around 15 seconds (from 145 seconds), the remainder tested showed negligible difference.

@smowton
Copy link
Contributor

smowton commented May 13, 2019

Therefore I'm pretty confident this has no noticeable cost in context of the other fairly-expensive ops we perform in symex.

@smowton smowton force-pushed the value-set-filtering-infeasible-branch branch from 13b6290 to ff6002b Compare May 13, 2019 16:08
const symbol_exprt &symbol_expr,
const exprt &other_operand,
const value_sett &value_set,
const irep_idt language_mode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const &

/// \return If we were able to evaluate the condition as true or false then we
/// return that, otherwise we return an empty optionalt
static optionalt<renamedt<exprt, L2>> try_evaluate_pointer_comparison(
irep_idt operation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const &

static optionalt<renamedt<exprt, L2>> try_evaluate_pointer_comparison(
const renamedt<exprt, L2> &renamed_expr,
const value_sett &value_set,
const irep_idt language_mode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const &

static renamedt<exprt, L2> try_evaluate_pointer_comparisons(
renamedt<exprt, L2> condition,
const value_sett &value_set,
const irep_idt language_mode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const &

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: ff6002b).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/111618837

Owen added 4 commits May 14, 2019 13:40
value_set_dereferencet::valuet is only used as the return value of
value_set_dereferencet::build_reference_to. valuet::value is an
expression for the object pointed to, which is what's needed for
dereferencing. In other contexts, e.g. filtering value-sets, what
is needed is an expression for the pointer. This logic already
existed in try_filter_value_set, but putting it directly in
build_reference_to makes it easier to use it in other places and
makes it clear that it didn't cope with typecasts properly.
Use the value-set to evaluate pointer comparisons as true or false
where possible
At time of writing, develop fails on the following lines of the test:
test::1::unconditionally_reachable_1[^\s]+ = \(goto_symex::\\guard[^\s]+ \? 7 : test::1::unconditionally_reachable_1[^\s]+\)$ [FAILED]
test::1::unconditionally_reachable_6[^\s]+ = \(goto_symex::\\guard[^\s]+ \? 7 : test::1::unconditionally_reachable_6[^\s]+\)$ [FAILED]
test::1::unconditionally_reachable_7[^\s]+ = \(goto_symex::\\guard[^\s]+ \? 7 : test::1::unconditionally_reachable_7[^\s]+\)$ [FAILED]
test::1::unconditionally_reachable_8[^\s]+ = \(goto_symex::\\guard[^\s]+ \? 7 : test::1::unconditionally_reachable_8[^\s]+\)$ [FAILED]
test::1::unconditionally_reachable_9[^\s]+ = \(goto_symex::\\guard[^\s]+ \? 7 : test::1::unconditionally_reachable_9[^\s]+\)$ [FAILED]
test::1::unconditionally_reachable_10[^\s]+ = \(goto_symex::\\guard[^\s]+ \? 7 : test::1::unconditionally_reachable_10[^\s]+\)$ [FAILED]
test::1::unconditionally_reachable_11[^\s]+ = \(goto_symex::\\guard[^\s]+ \? 7 : test::1::unconditionally_reachable_11[^\s]+\)$ [FAILED]
test::1::unconditionally_reachable_12[^\s]+ = \(goto_symex::\\guard[^\s]+ \? 7 : test::1::unconditionally_reachable_12[^\s]+\)$ [FAILED]
test::1::unreachable_1[^\s]+ = 7$ [FAILED]
test::1::unreachable_6[^\s]+ = 7$ [FAILED]
test::1::unreachable_7[^\s]+ = 7$ [FAILED]
test::1::unreachable_8[^\s]+ = 7$ [FAILED]
test::1::unreachable_9[^\s]+ = 7$ [FAILED]
test::1::unreachable_10[^\s]+ = 7$ [FAILED]
test::1::unreachable_11[^\s]+ = 7$ [FAILED]
test::1::unreachable_12[^\s]+ = 7$ [FAILED]
@smowton smowton force-pushed the value-set-filtering-infeasible-branch branch from ff6002b to 213a1b2 Compare May 14, 2019 12:40
@smowton smowton merged commit 0aef428 into diffblue:develop May 14, 2019
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 213a1b2).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/111750931

@owen-mc-diffblue owen-mc-diffblue deleted the value-set-filtering-infeasible-branch branch May 15, 2019 10:05
@tautschnig
Copy link
Collaborator

I'm afraid this actually causes spurious counterexamples being reported. I'll try to build a small regression test.

@tautschnig
Copy link
Collaborator

Actually it's pretty simple:

int main()
{
  void *ptr;
  if(ptr == 0)
  {
    return 0;
  }
  __CPROVER_assert(ptr != 0, "null");
}

I believe the problem is that ptr has never been assigned, and thus the value set has no information about it.

@smowton
Copy link
Contributor

smowton commented May 15, 2019

Discussion on Slack summary: will revert and re-submit with a fix

tautschnig added a commit to tautschnig/cbmc that referenced this pull request May 16, 2019
This is to guard against the regression earlier introduced in diffblue#4444 (and
reverted in diffblue#4656).
kroening pushed a commit that referenced this pull request May 16, 2019
This is to guard against the regression earlier introduced in #4444 (and
reverted in #4656).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants