Skip to content

Symex: propagate assumptions and conditions #4386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Mar 14, 2019

This propagates simple tests and assumptions (e.g. ASSUME(x == 5), or IF y == null GOTO 1) similar to how assignments are already propagated.

My particular use case is to note that non-exceptional return from a Java function (dominated by IF @inflight_exception != NULL) and a catch block (terminated by @inflight_exception = NULL;) mean that we can be certain that there is no exception still in flight when executing normal code. However as the C tests accompanying this PR indicate, the concept is more general and is likely to have other benefits by eliminating e.g. redundant return-value or error checks.

This depends on #4199 (for easier L2 generation number management) and #4379 (to fix byte-update / SMT2 handling exposed by this PR), so please review only the last 5 commits.

if(
previous_state.threads.size() > 1 &&
previous_state.write_is_shared(ssa_lhs, ns) ==
goto_symex_statet::write_is_shared_resultt::SHARED)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a proper review just yet, only poking at what I guessed would be key bits:

  1. Why can't the goal of this PR be addressed completely by putting the RHS value in the propagation map? (And really just this, nothing else.)
  2. I would have assumed the one case where we cannot do this is when we're dealing with shared variables. But now the above condition suggests that it's exactly that case that we do put in the propagation map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. According to our current merge logic, equal L2 generation numbers --> merge is a no-op. We must either introduce the possibility that despite equal L2 generation numbers the constant prop might disagree (moderately costly), or allocate an "administrative" L2 generation, as we do for constant assignment.
  2. I accidentally flipped the condition while refactoring, fixed now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't this then introduce spurious merge assignments? And is it actually a problem if constant propagation disagrees?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the value-set assignment... just seems like we might as well; this way @owen-jones-diffblue's value-set-filtering will have less work to do as we have established a straightforward constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra merge assignments: yes, harmless but some noise, indeed. We must do something about the constant propagator, or else the knowledge will outlive the code dominated by the branch or test.

Basically would you rather handle this through the L2 mechanism, or pay cost prop. to the size of the propagator map at each branch to merge them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth adding an extra set of irep_idt that lists the constants inferred that way, and is then consulted at join points?

Copy link
Contributor Author

@smowton smowton Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do that -- will appeal to @kroening for an opinion before doing it though

@smowton smowton force-pushed the smowton/feature/symex-condition-driving branch from 864a3e0 to 2fca1b8 Compare March 14, 2019 15:34
@smowton
Copy link
Contributor Author

smowton commented Mar 14, 2019

The win32 CI failure is inherited from #4199

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 2fca1b8).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/104441777

@tautschnig tautschnig changed the title Symex: propagate assumptions and conditions Symex: propagate assumptions and conditions [depends-on: #4199, #4379] Mar 14, 2019
@@ -82,6 +82,11 @@ class goto_statet
value_set = std::move(other_state.value_set);
}

void apply_condition(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please briefly document this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented at definition site

@@ -22,6 +22,28 @@ Author: Daniel Kroening, [email protected]
#include <analyses/dirty.h>
#include <util/simplify_expr.h>

static void apply_goto_condition(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please briefly document this function

if(!symex_config.doing_path_exploration)
{
// In --paths mode we would require the GOTO condition as a constraint
// as well, effectively treating the branch like an assumption.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. Possibly the condition is inverted by mistake.

@smowton smowton force-pushed the smowton/feature/symex-condition-driving branch from 2fca1b8 to 31451a4 Compare March 15, 2019 10:39
@smowton
Copy link
Contributor Author

smowton commented Mar 15, 2019

@owen-jones-diffblue all comments applied

/// assumption or a GOTO condition that we just passed through, we can propagate
/// the constant '5' for future 'x' references. If the condition is (y == &o1)
/// then we can use that to populate the value set.
/// \param condition: condition that must hold on this path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you mention that this should be L2? There's a short end-of-line comment to that effect in the declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@smowton smowton changed the title Symex: propagate assumptions and conditions [depends-on: #4199, #4379] Symex: propagate assumptions and conditions [depends-on: #4199] Mar 15, 2019
smowton added 3 commits March 15, 2019 16:44
This enables us to more conveniently allocate a new generation on a non-current
state, such as a state copy that will be used on a pending branch.

No behavioural change.
l2_thread_write_encoding both makes a quite complicated test regarding whether a write is to be
considered shared state, and also enacts some side-effects for both shared and non-shared state.
This splits part of its testing into write_is_shared.

No behavioural change.
@smowton smowton force-pushed the smowton/feature/symex-condition-driving branch 2 times, most recently from 241bae7 to 83ddfc2 Compare March 15, 2019 16:58
@smowton
Copy link
Contributor Author

smowton commented Mar 15, 2019

This is now rebased and ready to go.

@smowton smowton changed the title Symex: propagate assumptions and conditions [depends-on: #4199] Symex: propagate assumptions and conditions Mar 15, 2019
Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@peterschrammel peterschrammel removed their assignment Mar 15, 2019
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks sensible, but if the purpose is performance, it would be nice to post some performance results to at least check that there is no slowdown.

}

// Try it once more, to check we also know after an unreachable catch that
// there is still no inflight exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 A test with nested try-catch would be useful to prove that point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a case actually does not work yet, because symex can't tell that all exceptions were caught (i.e. the catch dispatch table is exhaustive). The reason is it isn't yet smart enough to notice that upon entering the catch block the incoming pointer cannot be null, and therefore e instanceof Exception certainly succeeds (null instanceof A returns false for any A).

Fixing that is my next task!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a KNOWNBUG test for it now and turn it on as part of your next task.

The unreachable function should be successfully marked unreachable by symex,
but the final uncaught exception is not yet, as it requires symex to note that
within a catch block @inflight_exception is definitely *not* null, which it
can't just yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what that comment is here for. The test is already marked as CORE, so what difference should we see when the stated limitation is resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we'll resolve all conditions in symex. I'll amend to make that clearer.

Copy link
Contributor

@owen-mc-diffblue owen-mc-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message for "Symex: propagate constants implied by assumptions and conditions" says the constant propagator might be populated with new information. Is it also possible that the value-set gets updated? If so, please correct the commit message.

^warning: ignoring
--
The unreachable function should be successfully marked unreachable by symex,
but the final uncaught exception is not yet, as it requires symex to note that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what "the final uncaught exception" means. All the exceptions appear to be caught to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to the "did any exception escape at top level?" test that's always generated in __CPROVER_start, and yes they are all caught, but symex isn't quite smart enough to determine that yet. This PR enables it to determine there's no exception in flight when we return to normal code (preventing exploration of catch blocks after future no-throw function calls), but doesn't yet enable it to determine catch block does not rethrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll improve the wording here)

smowton added 2 commits March 18, 2019 16:15
When passing `assume(symbol == constant)` or `if symbol == constant then GOTO`, we can populate the
constant propagator and value-set accordingly and use that information until the next merge point without
that constraint. We implement this by allocating and defining a fresh L2 generation on this path, which
will be merged as "real", assignment-derived generations are. Symbols are subject to propagation under
the same conditions as they are on assignment (e.g. requiring that they are not subject to concurrent
modification by other threads).
@smowton smowton force-pushed the smowton/feature/symex-condition-driving branch from 83ddfc2 to e0d4d78 Compare March 18, 2019 16:16
@smowton
Copy link
Contributor Author

smowton commented Mar 18, 2019

@allredj I ran this on some Tika functions and observed that a small batch of 6 functions that took a total of 248s (symex-only) took 231s with this patch. Of course that's a small test, but it gives me some confidence this will be overall beneficial. I'll run a longer benchmark tomorrow to be sure and then merge if the results are also positive.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@peterschrammel peterschrammel assigned smowton and unassigned tautschnig and kroening Mar 18, 2019
@smowton
Copy link
Contributor Author

smowton commented Mar 19, 2019

perf_out

Extended benchmark results show a slight improvement, so merging this as-is and we can implement @tautschnig's suggestion re: using different data structures to record a path-local constant if and when we want to.

@smowton smowton merged commit 17fa596 into diffblue:develop Mar 19, 2019
@owen-mc-diffblue owen-mc-diffblue mentioned this pull request Mar 21, 2019
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants