-
Notifications
You must be signed in to change notification settings - Fork 273
Reachability slicer: mark reachable code more precisely #2195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reachability slicer: mark reachable code more precisely #2195
Conversation
Tagging @thk123 for interest, as I believe deeptest has been using the reachability slicer to some degree? |
Interesting... I ran into a similar effect when working on the history awareness for the abstract interpreter. Useful and worth doing. |
std::vector<cfgt::node_indext> reachable = cfg.get_reachable(src, false); | ||
for(const auto index : reachable) | ||
cfg[index].reaches_assertion = true; | ||
std::vector<std::pair<cfgt::node_indext, bool>> stack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having to introduce this comment I'd suggest you define a struct
with members with meaningful names. That should make the code a lot easier to understand.
std::vector<cfgt::node_indext> src = get_sources(is_threaded, criterion); | ||
// Stack entries are pairs of node indices and a boolean flag that indicates | ||
// whether the function's callsite is known, in which case it has already | ||
// been placed on the stack and return sites are a no-op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy&paste emphasizes my above point...
|
||
while(!stack.empty()) | ||
{ | ||
auto index = stack.back().first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
or you might as well spare us this one and right away do auto &node = cfg[stack.back().first];
{ | ||
stack.emplace_back(edge.first, true); | ||
stack.emplace_back( | ||
cfg.entry_map[std::prev(node.PC)], callsite_is_known); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of std::prev
is neither safe (as the Travis failure shows) nor sound: an instruction may have multiple incoming edges on the CFG. I believe you will need to go via the incoming edges on the CFG rather than just taking the predecessor in the list of goto program instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may, but that imprecision is exactly what I want to eliminate. If we entered this function by way of a particular callsite (indicated by callsite_is_known
) then we should not further explore all possible callers, only the one that's actually reachable.
The problem was simply taking std::prev
of the head of a function -- I've fixed this using std::next(pred_it)
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of prev
at line 85 might indeed be an issue. Otherwise, looks good to me.
Question: In what situations a Boolean flag (associated with a CFG node index) differs from result of checking whether node.PC->function
is in the set of functions of get_sources(is_threaded, criterion);
?
{ | ||
// Queue the instruction's natural successor (function head, or next | ||
// instruction if the function is bodyless) | ||
INVARIANT(node.out.size() == 1, "Call sites should have one successor"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true also for GOTO obtained from a C
program (e.g. a call via a pointer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfgt
requires that remove_function_pointer
has run (it ignores function calls that don't call an ID_symbol
)
9622910
to
7512c70
Compare
@tautschnig changes applied, please re-review |
@marek-trtik you could have !caller_is_known if you have something like: unreachable() { h(); }
f2() { g(); }
f1() { g(); }
g() { h(); i(); }
h() {}
i() { assert(false); } Here, if we start at the assert and walk backwards, |
@smowton Thank you! |
{ | ||
// This is an end-of-function -> successor-of-callsite edge. | ||
// Thus node.PC must have a predecessor instruction which is the calling | ||
// instruction. Queue both the caller and the end of the callee. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an invariant checking that indeed this predecessor is a function call. Essentially the middle sentence should not be part of a comment, but instead be spelled out as an invariant.
{ | ||
// Skip this predecessor, unless this is a bodyless function, or we | ||
// don't know who our callee was: | ||
if(!caller_is_known || std::next(pred_node.PC) == node.PC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comparison of iterators possible across different lists of instructions. Use std::next(pred_node.PC)->function == node.PC->function
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I need to test both (recursion), but I now check ->function first for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but do you actually need to care about recursion? The function will have been marked anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, true. I'm inclined to keep the "exact" check anyhow since it's clearer what is intended however.
while(!stack.empty()) | ||
{ | ||
auto &node = cfg[stack.back().node_index]; | ||
auto caller_is_known = stack.back().caller_is_known; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
while(!stack.empty()) | ||
{ | ||
auto &node = cfg[stack.back().node_index]; | ||
auto caller_is_known = stack.back().caller_is_known; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
// Queue the instruction's natural successor (function head, or next | ||
// instruction if the function is bodyless) | ||
INVARIANT(node.out.size() == 1, "Call sites should have one successor"); | ||
auto successor_index = node.out.begin()->first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
// have already taken care of the return site. | ||
const auto &callee_head_node = cfg[successor_index]; | ||
auto callee_it = callee_head_node.PC; | ||
if(callee_it != std::next(node.PC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be across different lists, use callee_it->function != std::next(node.PC)->function
instead.
while(!callee_it->is_end_function()) | ||
++callee_it; | ||
|
||
if(cfg[cfg.entry_map[callee_it]].out.size() != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use .empty()
{ | ||
for(const auto &edge : node.out) | ||
stack.emplace_back(edge.first, caller_is_known); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd merge those two cases:
else if(!node.PC->is_end_function() || !caller_is_known)
{
for(const auto &edge : node.out)
stack.emplace_back(edge.first, caller_is_known);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've merged them, though not quite like that.
7512c70
to
ebe0d79
Compare
@tautschnig changes applied. |
Approved, but a regression test would be very nice to have. |
The reachability slicer currently uses a very simple graph walk, and in particular walks out of a function to all possible callers, regardless of whether we know the actual caller. This commit fixes that shortcoming by adding the callsite successor *at the callsite*, and tracking the fact that the callee's successor has already been taken care of in the graph search stack, thus allowing it to ignore the END_FUNCTION -> callsite edges which are less precise. Functions whose caller is genuinely unknown, such as the root function containing a reachability target (e.g. assert instruction) are treated as before, considering all possible callees. The backwards search is improved similarly to the forwards.
ebe0d79
to
3487bf7
Compare
@tautschnig added, and indeed found a bug by doing it. |
Thanks! Now I'm curious what that bug was, because I clearly missed it while doing the reviews! |
|
31ef182 Merge pull request diffblue#2208 from diffblue/cleanout-goto-programs 0e84d3e Merge pull request diffblue#2195 from smowton/smowton/feature/smarter-reachability-slicer c801126 Merge pull request diffblue#2209 from diffblue/travis-no-osx-g++ 3487bf7 Reachability slicer: mark reachable code more precisely 85e8451 Merge pull request diffblue#2066 from tautschnig/bv-endianness 5e5eafb Merge pull request diffblue#2191 from smowton/smowton/feature/java-fatal-assertions 8a68ab8 remove g++ build on OS X; this is identical to the clang++ build 48e5b25 Amend faulty long-string test 3f718ba Java: make null-pointer and similar checks fatal 821eb1d remove basic_blocks and goto_program_irep d87c2db Merge pull request diffblue#2203 from diffblue/typed-lookup d77dea3 strongly type the lookup() methods in namespacet 2382f27 Merge pull request diffblue#2193 from martin-cs/feature/correct-instruction-documentation c314272 Merge pull request diffblue#2181 from diffblue/format-expr-constants 514a0a5 Merge pull request diffblue#2167 from diffblue/parse_unwindset_options 0102452 move escape(s) to string_utils f1b6174 use unwindsett in goto-instrument dcc907a introduce unwindsett for unwinding limits 10aeae8 format_expr now does index, c_bool constants, string constants 92b92d6 Correct the documentation of ASSERT : it does not alter control flow. a11add8 Expand the documentation of the goto-program instructions. a06503b Merge pull request diffblue#2197 from tautschnig/fix-help 05e4bc3 Remove stray whitespace previously demanded by clang-format 3261f4d Fix help output of --generate-function-body-options 7c67b23 Merge pull request diffblue#2110 from tautschnig/type-mismatch-exception 18fb262 Merge pull request diffblue#2025 from tautschnig/stack-depth-fix 9191b6a Merge pull request diffblue#2199 from tautschnig/simplifier-typing f99e631 Merge pull request diffblue#2198 from tautschnig/fix-test-desc 1a79a11 stack depth instrumentation: __CPROVER_initialize may be empty a7690ba Merge pull request diffblue#2185 from smowton/smowton/fix/tolerate-ts18661-types fc02e8f Restore returns before writing the simplified binary 49333eb Make restore_returns handle simplified programs 46f7987 Fix perl regular expressioons in regression test descriptions 9fe432b Merge pull request diffblue#1899 from danpoe/sharing-map-catch-unit-tests 9cc6aae Merge pull request diffblue#2081 from hannes-steffenhagen-diffblue/floating_point_simplificiation da19abe Tolerate TS 18661 (_Float32 et al) types a055454 Try all rounding modes when domain is unknown 5f7bc15 Add float support to constant propagator domain 3dc2244 Move adjust_float_expressions to goto-programs 06d8bea Merge pull request diffblue#2187 from diffblue/move-convert-nondet 6d8c3fa Merge pull request diffblue#2189 from thk123/bugfix/json-parser-restart 2ad157f Merge pull request diffblue#2186 from smowton/smowton/fix/call-graph-uninit-field cd54ad7 Corrected state persistence in the json parser 4447be0 Fix uninitialised collect_callsites field in call_grapht ead0aa3 Merge pull request diffblue#2188 from smowton/smowton/fix/init-string-types-without-refine-strings 57988fc Fix String type initialisation when --refine-strings is not active 6a76aff Move convert_nondet to java_bytecode ac6eb21 Merge pull request diffblue#1858 from danpoe/dependence-graph-fix 10b8b09 Merge pull request diffblue#2011 from thomasspriggs/final_classes a154593 Merge pull request diffblue#2087 from danpoe/feature/small-map 7002909 More dependence graph tests 66263ea Make dependence graph merge() return true when abstract state changes 3aa6dca Control dependency computation fix a408423 Simplified state merging in the dependence graph 0315816 Merge pull request diffblue#2180 from thomasspriggs/json_id2string 8941567 Merge pull request diffblue#2124 from smowton/smowton/feature/fallback-function-provider cd04e70 JBMC: use simple method stubbing pass a6b2cda Add Java simple method stubbing pass ec1127c Lazy goto model: hook for driver program to replace or stub functions b6a4382 Remove `id2string` from inside calls to the `json_stringt` constructor. b673ebb Add storage of final modifier status of java classes in `java_class_typet`. a2ad909 Small map 808dade Provide suitable diagnostics for equality-without-matching-types 89cf6fe Throw appropriate exceptions when converting constraints 2ae66c2 Produce a proper error message when catching a std::runtime_error at top level e7b3ade Workaround for Visual Studio loss of CV qualifier bug 1f661f5 Move sharing map and sharing node unit tests to util 47463a3 Use std::size_t instead of unsigned in the sharing map 3e22217 Sharing map documentation e54f740 Fix sharing map compiler warnings bcc489c Move sharing map unit tests to catch 34114b8 Use a specialised endianness map for flattening git-subtree-dir: cbmc git-subtree-split: 31ef182
The reachability slicer currently uses a very simple graph walk, and in particular walks out of a function to all possible callers, regardless of whether we know the actual caller. This commit fixes that shortcoming by adding the callsite successor at the callsite, and tracking the fact that the callee's successor has already been taken care of in the graph search stack, thus allowing it to ignore the END_FUNCTION ->
allsite edges which are less precise.
Functions whose caller is genuinely unknown, such as the root function containing a reachability target
e.g. assert instruction) are treated as before, considering all possible callees. The backwards search is improved similarly to the forwards.
This needs some tests of course, but is ready to review in the meantime.