Skip to content

Reachability slicer: mark reachable code more precisely #2195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

smowton
Copy link
Contributor

@smowton smowton commented May 17, 2018

The reachability slicer currently uses a very simple graph walk, and in particular walks out of a function to all possible callers, regardless of whether we know the actual caller. This commit fixes that shortcoming by adding the callsite successor at the callsite, and tracking the fact that the callee's successor has already been taken care of in the graph search stack, thus allowing it to ignore the END_FUNCTION ->
allsite edges which are less precise.

Functions whose caller is genuinely unknown, such as the root function containing a reachability target
e.g. assert instruction) are treated as before, considering all possible callees. The backwards search is improved similarly to the forwards.

This needs some tests of course, but is ready to review in the meantime.

@smowton
Copy link
Contributor Author

smowton commented May 17, 2018

Tagging @thk123 for interest, as I believe deeptest has been using the reachability slicer to some degree?

@martin-cs
Copy link
Collaborator

Interesting... I ran into a similar effect when working on the history awareness for the abstract interpreter. Useful and worth doing.

std::vector<cfgt::node_indext> reachable = cfg.get_reachable(src, false);
for(const auto index : reachable)
cfg[index].reaches_assertion = true;
std::vector<std::pair<cfgt::node_indext, bool>> stack;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having to introduce this comment I'd suggest you define a struct with members with meaningful names. That should make the code a lot easier to understand.

std::vector<cfgt::node_indext> src = get_sources(is_threaded, criterion);
// Stack entries are pairs of node indices and a boolean flag that indicates
// whether the function's callsite is known, in which case it has already
// been placed on the stack and return sites are a no-op.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy&paste emphasizes my above point...


while(!stack.empty())
{
auto index = stack.back().first;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const or you might as well spare us this one and right away do auto &node = cfg[stack.back().first];

{
stack.emplace_back(edge.first, true);
stack.emplace_back(
cfg.entry_map[std::prev(node.PC)], callsite_is_known);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of std::prev is neither safe (as the Travis failure shows) nor sound: an instruction may have multiple incoming edges on the CFG. I believe you will need to go via the incoming edges on the CFG rather than just taking the predecessor in the list of goto program instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may, but that imprecision is exactly what I want to eliminate. If we entered this function by way of a particular callsite (indicated by callsite_is_known) then we should not further explore all possible callers, only the one that's actually reachable.

The problem was simply taking std::prev of the head of a function -- I've fixed this using std::next(pred_it) instead.

Copy link
Contributor

@marek-trtik marek-trtik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of prev at line 85 might indeed be an issue. Otherwise, looks good to me.

Question: In what situations a Boolean flag (associated with a CFG node index) differs from result of checking whether node.PC->function is in the set of functions of get_sources(is_threaded, criterion);?

{
// Queue the instruction's natural successor (function head, or next
// instruction if the function is bodyless)
INVARIANT(node.out.size() == 1, "Call sites should have one successor");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true also for GOTO obtained from a C program (e.g. a call via a pointer)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfgt requires that remove_function_pointer has run (it ignores function calls that don't call an ID_symbol)

@smowton smowton force-pushed the smowton/feature/smarter-reachability-slicer branch from 9622910 to 7512c70 Compare May 18, 2018 10:47
@smowton
Copy link
Contributor Author

smowton commented May 18, 2018

@tautschnig changes applied, please re-review

@smowton
Copy link
Contributor Author

smowton commented May 18, 2018

@marek-trtik you could have !caller_is_known if you have something like:

unreachable() { h(); }
f2() { g(); }
f1() { g(); }
g() { h(); i(); }
h() {}
i() { assert(false); }

Here, if we start at the assert and walk backwards, caller_is_known is false stepping out of i, true
stepping into h() (so we do not walk out of h into unreachable), but then false again stepping out of g (we do step out into both f1 and f2)

@marek-trtik
Copy link
Contributor

@smowton Thank you!

{
// This is an end-of-function -> successor-of-callsite edge.
// Thus node.PC must have a predecessor instruction which is the calling
// instruction. Queue both the caller and the end of the callee.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an invariant checking that indeed this predecessor is a function call. Essentially the middle sentence should not be part of a comment, but instead be spelled out as an invariant.

{
// Skip this predecessor, unless this is a bodyless function, or we
// don't know who our callee was:
if(!caller_is_known || std::next(pred_node.PC) == node.PC)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comparison of iterators possible across different lists of instructions. Use std::next(pred_node.PC)->function == node.PC->function instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I need to test both (recursion), but I now check ->function first for this reason.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but do you actually need to care about recursion? The function will have been marked anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, true. I'm inclined to keep the "exact" check anyhow since it's clearer what is intended however.

while(!stack.empty())
{
auto &node = cfg[stack.back().node_index];
auto caller_is_known = stack.back().caller_is_known;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

while(!stack.empty())
{
auto &node = cfg[stack.back().node_index];
auto caller_is_known = stack.back().caller_is_known;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

// Queue the instruction's natural successor (function head, or next
// instruction if the function is bodyless)
INVARIANT(node.out.size() == 1, "Call sites should have one successor");
auto successor_index = node.out.begin()->first;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

// have already taken care of the return site.
const auto &callee_head_node = cfg[successor_index];
auto callee_it = callee_head_node.PC;
if(callee_it != std::next(node.PC))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be across different lists, use callee_it->function != std::next(node.PC)->function instead.

while(!callee_it->is_end_function())
++callee_it;

if(cfg[cfg.entry_map[callee_it]].out.size() != 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use .empty()

{
for(const auto &edge : node.out)
stack.emplace_back(edge.first, caller_is_known);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd merge those two cases:

else if(!node.PC->is_end_function() || !caller_is_known)
{
  for(const auto &edge : node.out)
    stack.emplace_back(edge.first, caller_is_known);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged them, though not quite like that.

@smowton smowton force-pushed the smowton/feature/smarter-reachability-slicer branch from 7512c70 to ebe0d79 Compare May 18, 2018 12:41
@smowton
Copy link
Contributor Author

smowton commented May 18, 2018

@tautschnig changes applied.

@tautschnig
Copy link
Collaborator

Approved, but a regression test would be very nice to have.

@tautschnig tautschnig assigned smowton and unassigned tautschnig May 19, 2018
The reachability slicer currently uses a very simple graph walk, and in particular walks out of a function
to all possible callers, regardless of whether we know the actual caller. This commit fixes that shortcoming
by adding the callsite successor *at the callsite*, and tracking the fact that the callee's successor has
already been taken care of in the graph search stack, thus allowing it to ignore the END_FUNCTION -> callsite
edges which are less precise.

Functions whose caller is genuinely unknown, such as the root function containing a reachability target (e.g.
assert instruction) are treated as before, considering all possible callees. The backwards search is improved
similarly to the forwards.
@smowton smowton force-pushed the smowton/feature/smarter-reachability-slicer branch from ebe0d79 to 3487bf7 Compare May 21, 2018 09:32
@smowton
Copy link
Contributor Author

smowton commented May 21, 2018

@tautschnig added, and indeed found a bug by doing it.

@tautschnig
Copy link
Collaborator

added, and indeed found a bug by doing it.

Thanks! Now I'm curious what that bug was, because I clearly missed it while doing the reviews!

@smowton
Copy link
Contributor Author

smowton commented May 21, 2018

if(!cfg[cfg.entry_map[callee_it]].out.empty()) was incorrectly written as if(cfg[!cfg.entry_map[callee_it]].out.empty()), which happens to typecheck but is nonsense.

@smowton smowton merged commit 0e84d3e into diffblue:develop May 21, 2018
NathanJPhillips pushed a commit to NathanJPhillips/cbmc that referenced this pull request Aug 22, 2018
31ef182 Merge pull request diffblue#2208 from diffblue/cleanout-goto-programs
0e84d3e Merge pull request diffblue#2195 from smowton/smowton/feature/smarter-reachability-slicer
c801126 Merge pull request diffblue#2209 from diffblue/travis-no-osx-g++
3487bf7 Reachability slicer: mark reachable code more precisely
85e8451 Merge pull request diffblue#2066 from tautschnig/bv-endianness
5e5eafb Merge pull request diffblue#2191 from smowton/smowton/feature/java-fatal-assertions
8a68ab8 remove g++ build on OS X; this is identical to the clang++ build
48e5b25 Amend faulty long-string test
3f718ba Java: make null-pointer and similar checks fatal
821eb1d remove basic_blocks and goto_program_irep
d87c2db Merge pull request diffblue#2203 from diffblue/typed-lookup
d77dea3 strongly type the lookup() methods in namespacet
2382f27 Merge pull request diffblue#2193 from martin-cs/feature/correct-instruction-documentation
c314272 Merge pull request diffblue#2181 from diffblue/format-expr-constants
514a0a5 Merge pull request diffblue#2167 from diffblue/parse_unwindset_options
0102452 move escape(s) to string_utils
f1b6174 use unwindsett in goto-instrument
dcc907a introduce unwindsett for unwinding limits
10aeae8 format_expr now does index, c_bool constants, string constants
92b92d6 Correct the documentation of ASSERT : it does not alter control flow.
a11add8 Expand the documentation of the goto-program instructions.
a06503b Merge pull request diffblue#2197 from tautschnig/fix-help
05e4bc3 Remove stray whitespace previously demanded by clang-format
3261f4d Fix help output of --generate-function-body-options
7c67b23 Merge pull request diffblue#2110 from tautschnig/type-mismatch-exception
18fb262 Merge pull request diffblue#2025 from tautschnig/stack-depth-fix
9191b6a Merge pull request diffblue#2199 from tautschnig/simplifier-typing
f99e631 Merge pull request diffblue#2198 from tautschnig/fix-test-desc
1a79a11 stack depth instrumentation: __CPROVER_initialize may be empty
a7690ba Merge pull request diffblue#2185 from smowton/smowton/fix/tolerate-ts18661-types
fc02e8f Restore returns before writing the simplified binary
49333eb Make restore_returns handle simplified programs
46f7987 Fix perl regular expressioons in regression test descriptions
9fe432b Merge pull request diffblue#1899 from danpoe/sharing-map-catch-unit-tests
9cc6aae Merge pull request diffblue#2081 from hannes-steffenhagen-diffblue/floating_point_simplificiation
da19abe Tolerate TS 18661 (_Float32 et al) types
a055454 Try all rounding modes when domain is unknown
5f7bc15 Add float support to constant propagator domain
3dc2244 Move adjust_float_expressions to goto-programs
06d8bea Merge pull request diffblue#2187 from diffblue/move-convert-nondet
6d8c3fa Merge pull request diffblue#2189 from thk123/bugfix/json-parser-restart
2ad157f Merge pull request diffblue#2186 from smowton/smowton/fix/call-graph-uninit-field
cd54ad7 Corrected state persistence in the json parser
4447be0 Fix uninitialised collect_callsites field in call_grapht
ead0aa3 Merge pull request diffblue#2188 from smowton/smowton/fix/init-string-types-without-refine-strings
57988fc Fix String type initialisation when --refine-strings is not active
6a76aff Move convert_nondet to java_bytecode
ac6eb21 Merge pull request diffblue#1858 from danpoe/dependence-graph-fix
10b8b09 Merge pull request diffblue#2011 from thomasspriggs/final_classes
a154593 Merge pull request diffblue#2087 from danpoe/feature/small-map
7002909 More dependence graph tests
66263ea Make dependence graph merge() return true when abstract state changes
3aa6dca Control dependency computation fix
a408423 Simplified state merging in the dependence graph
0315816 Merge pull request diffblue#2180 from thomasspriggs/json_id2string
8941567 Merge pull request diffblue#2124 from smowton/smowton/feature/fallback-function-provider
cd04e70 JBMC: use simple method stubbing pass
a6b2cda Add Java simple method stubbing pass
ec1127c Lazy goto model: hook for driver program to replace or stub functions
b6a4382 Remove `id2string` from inside calls to the `json_stringt` constructor.
b673ebb Add storage of final modifier status of java classes in `java_class_typet`.
a2ad909 Small map
808dade Provide suitable diagnostics for equality-without-matching-types
89cf6fe Throw appropriate exceptions when converting constraints
2ae66c2 Produce a proper error message when catching a std::runtime_error at top level
e7b3ade Workaround for Visual Studio loss of CV qualifier bug
1f661f5 Move sharing map and sharing node unit tests to util
47463a3 Use std::size_t instead of unsigned in the sharing map
3e22217 Sharing map documentation
e54f740 Fix sharing map compiler warnings
bcc489c Move sharing map unit tests to catch
34114b8 Use a specialised endianness map for flattening

git-subtree-dir: cbmc
git-subtree-split: 31ef182
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants