|
7 | 7 |
|
8 | 8 | This directory contains a symbolic evaluation system for goto-programs.
|
9 | 9 | This takes a goto-program and translates it to an equation system by
|
10 |
| -traversing the program, branching and merging and unwinding loops as |
11 |
| -needed. Each reverse goto has a separate counter (the actual counting is |
12 |
| -handled by `cbmc`, see the `–unwind` and `–unwind-set` options). When a |
13 |
| -counter limit is reach, an assertion can be added to explicitly show |
14 |
| -when analysis is incomplete. The symbolic execution includes constant |
15 |
| -folding so loops that have a constant number of iterations will be |
16 |
| -handled completely (assuming the unwinding limit is sufficient). |
| 10 | +traversing the program, branching and merging and unwinding loops and recursion |
| 11 | +as needed. |
17 | 12 |
|
18 | 13 | The output of the symbolic execution is a system of equations; an object
|
19 |
| -containing a list of `symex_target_elements`, each of which are |
20 |
| -equalities between `expr` expressions. See `symex_target_equation.h`. |
| 14 | +containing a list of \ref symex_target_equationt::SSA_stept structures, each of |
| 15 | +which are equalities between \ref exprt expressions. |
21 | 16 | The output is in static, single assignment (SSA) form, which is *not*
|
22 | 17 | the case for goto-programs.
|
23 | 18 |
|
@@ -50,44 +45,89 @@ The \ref bmct creates a goto_symext to symbolically execute the
|
50 | 45 | goto-program, thereby filling the equation, which can then be passed
|
51 | 46 | along to the SAT solver.
|
52 | 47 |
|
53 |
| -The symbolic execution state is maintained by goto_symex_statet. This is |
54 |
| -a memory-heavy data structure, so goto_symext creates it on-demand and |
55 |
| -lets it go out-of-scope as soon as possible. However, the process of |
56 |
| -symbolic execution fills out an additional \ref symbol_tablet |
| 48 | +The class \ref goto_symext holds the global state of the symbol executor, while |
| 49 | +\ref goto_symex_statet holds the program state at a particular point in |
| 50 | +symbolic execution. In straight-line code a single \ref goto_symex_statet is |
| 51 | +maintained, being transformed by each instruction in turn, but these state |
| 52 | +objects are cloned and merged at control-flow forks and joins respectively. |
| 53 | +\ref goto_symex_statet contains an instance of \ref value_sett, used to track |
| 54 | +what objects pointers may refer to, and a constant propagator domain used to |
| 55 | +try to simplify assignments and (more usefully) resolve branch instructions. |
| 56 | + |
| 57 | +This is a memory-heavy data structure, so goto_symext creates it on-demand and |
| 58 | +lets it go out-of-scope as soon as possible. |
| 59 | + |
| 60 | +The process of |
| 61 | +symbolic execution generates an additional \ref symbol_tablet |
57 | 62 | "symbol table" of dynamically-created objects; this symbol table is
|
58 | 63 | needed when solving the equation. This symbol table must thus be
|
59 | 64 | exported out of the state before it is torn down; this is done through
|
60 | 65 | the parameter "`new_symbol_table`" passed as a reference to the various
|
61 | 66 | functions in goto_symext.
|
62 | 67 |
|
63 |
| -The main symbolic execution loop code is goto_symext::symex_step(). This |
| 68 | +The main symbolic execution loop code is \ref goto_symext::symex_step. This |
64 | 69 | function case-switches over the type of the instruction that we're
|
65 | 70 | currently executing, and calls various other functions appropriate to
|
66 | 71 | the instruction type, i.e. goto_symext::symex_function_call() if the
|
67 | 72 | current instruction is a function call, goto_symext::symex_goto() if the
|
68 | 73 | current instruction is a goto, etc.
|
69 | 74 |
|
| 75 | +\subsection Loop and recursion unwinding |
| 76 | + |
| 77 | +Each backwards goto and recursive call has a separate counter |
| 78 | +(handled by \ref cbmc or another driver program, see the `--unwind` and |
| 79 | +`--unwind-set` options). The symbolic execution includes constant |
| 80 | +folding so loops that have a constant / bounded number of iterations will often |
| 81 | +be handled completely (assuming the unwinding limit is sufficient). |
| 82 | +When an unwind or recursion limit is reached, an assertion can be added to |
| 83 | +explicitly show when analysis is incomplete. |
| 84 | + |
| 85 | +\subsection \ref goto_symext::clean_expr |
| 86 | + |
| 87 | +Before any expression is incorporated into the output equation set it is passed |
| 88 | +through \ref goto_symext::clean_expr and thus \ref goto_symext::dereference, |
| 89 | +whose primary purpose is to remove dereference operations. It achieves this |
| 90 | +using the \ref value_sett stored in \ref goto_symex_statet, replacing `*x` with |
| 91 | +a construct such as |
| 92 | +`x == &candidate1 ? candidate1 : x == &candidate2 ? candidate2 : x$object` |
| 93 | + |
| 94 | +Note the `x$object` fallback candidate, which is known as a *failed symbol* or |
| 95 | +*failed object*, and represents the unknown object pointed to by `x` when |
| 96 | +neither of the candidates (`candidate1` and `candidate2` here) matched as |
| 97 | +expected. This is of course unsound, since `x$object` and `y$object` are always |
| 98 | +distinct, even if `x` and `y` might actually alias, but at least it ensures |
| 99 | +writes to and subsequent reads from `x` are related. |
| 100 | + |
| 101 | +\ref goto_symext::dereference function also passes its argument to |
| 102 | +\ref goto_symex_statet::rename, which is responsible for both SSA |
| 103 | +renaming symbols and for applying constant propagation where possible. Renaming |
| 104 | +is also performed elsewhere without calling \ref goto_symext::dereference when |
| 105 | +an expression is already known to be pointer-free. |
| 106 | + |
70 | 107 | \subsection symex-path Path exploration
|
71 | 108 |
|
72 | 109 | By default, CBMC symbolically executes the entire program, building up
|
73 | 110 | an equation representing all instructions, which it then passes to the
|
74 | 111 | solver. Notably, it _merges_ paths that diverge due to a goto
|
75 | 112 | instruction, forming a constraint that represents having taken either of
|
76 |
| -the two paths; the code for doing this is goto_symext::merge_goto(). |
77 |
| - |
78 |
| -CBMC can operate in a different mode when the `--paths` flag is passed |
79 |
| -on the command line. This disables path merging; instead, CBMC executes |
80 |
| -a single path at a time, calling the solver with the equation |
| 113 | +the two paths; the code for doing this is \ref goto_symext::merge_goto. |
| 114 | + |
| 115 | +Goto-symex can operate in a different mode when the `--paths` flag is passed |
| 116 | +in the \ref optionst object passed to its constructor (\ref cbmc passes this |
| 117 | +from the corresponding command-line option). |
| 118 | +This disables path merging; instead, symex executes |
| 119 | +a single path at a time, returning each one to its caller, which in the case of |
| 120 | +cbmc then calls its solver with the equation |
81 | 121 | representing that path, then continues to execute other paths.
|
82 | 122 | The 'other paths' that would have been taken when the program branches
|
83 |
| -are saved onto a workqueue so that CBMC can continue to execute the |
84 |
| -current path, and then later retrieve saved paths from the workqueue. |
| 123 | +are saved onto a workqueue so that the driver program can continue to execute |
| 124 | +the current path, and then later retrieve saved paths from the workqueue. |
85 | 125 | Implementation-wise, \ref bmct passes a worklist to goto_symext in
|
86 |
| -bmct::do_language_agnostic_bmc(). If path exploration is enabled, |
| 126 | +\ref bmct::do_language_agnostic_bmc. If path exploration is enabled, |
87 | 127 | goto_symext will fill up the worklist whenever it encounters a branch,
|
88 | 128 | instead of merging the paths on the branch. After the initial symbolic
|
89 |
| -execution (i.e. the first call to bmct::run() in |
90 |
| -bmct::do_language_agnostic_bmc()), \ref bmct continues popping the |
| 129 | +execution (i.e. the first call to \ref bmct::run in |
| 130 | +\ref bmct::do_language_agnostic_bmc), \ref bmct continues popping the |
91 | 131 | worklist and executing untaken paths until the worklist empties. Note
|
92 | 132 | that this means that the default model-checking behaviour is a special
|
93 | 133 | case of path exploration: when model-checking, the initial symbolic
|
@@ -155,6 +195,8 @@ time we access x as we walk through the function.
|
155 | 195 |
|
156 | 196 | In the \ref goto-symex directory.
|
157 | 197 |
|
| 198 | + |
| 199 | + |
158 | 200 | **Key classes:**
|
159 | 201 | * symex_target_equationt
|
160 | 202 | * prop_convt
|
|
0 commit comments