3
3
4
4
\author Kareem Khazem
5
5
6
+ \section symbolic-execution Symbolic Execution
7
+
8
+ In the \ref goto-symex directory.
9
+
6
10
** Key classes:**
7
11
* goto_symex_statet
8
12
* goto_symext
@@ -19,6 +23,115 @@ digraph G {
19
23
}
20
24
\enddot
21
25
26
+ \subsection symex-overview Overview
27
+
28
+ The \ref bmct class gets a reference to an \ref symex_target_equationt
29
+ "equation" (initially, an empty list of \ref symex_target_equationt::SSA_stept
30
+ "single-static assignment steps") and a goto-program from the frontend.
31
+ The \ref bmct creates a goto_symext to symbolically execute the
32
+ goto-program, thereby filling the equation, which can then be passed
33
+ along to the SAT solver.
34
+
35
+ The symbolic execution state is maintained by goto_symex_statet. This is
36
+ a memory-heavy data structure, so goto_symext creates it on-demand and
37
+ lets it go out-of-scope as soon as possible. However, the process of
38
+ symbolic execution fills out an additional \ref symbol_tablet
39
+ "symbol table" of dynamically-created objects; this symbol table is
40
+ needed when solving the equation. This symbol table must thus be
41
+ exported out of the state before it is torn down; this is done through
42
+ the parameter "` new_symbol_table ` " passed as a reference to the various
43
+ functions in goto_symext.
44
+
45
+ The main symbolic execution loop code is goto_symext::symex_step(). This
46
+ function case-switches over the type of the instruction that we're
47
+ currently executing, and calls various other functions appropriate to
48
+ the instruction type, i.e. goto_symext::symex_function_call() if the
49
+ current instruction is a function call, goto_symext::symex_goto() if the
50
+ current instruction is a goto, etc.
51
+
52
+ \subsection symex-path Path exploration
53
+
54
+ By default, CBMC symbolically executes the entire program, building up
55
+ an equation representing all instructions, which it then passes to the
56
+ solver. Notably, it _ merges_ paths that diverge due to a goto
57
+ instruction, forming a constraint that represents having taken either of
58
+ the two paths; the code for doing this is goto_symext::merge_goto().
59
+
60
+ CBMC can operate in a different mode when the ` --paths ` flag is passed
61
+ on the command line. This disables path merging; instead, CBMC executes
62
+ a single path at a time, calling the solver with the equation
63
+ representing that path, then continues to execute other paths.
64
+ The 'other paths' that would have been taken when the program branches
65
+ are saved onto a workqueue so that CBMC can continue to execute the
66
+ current path, and then later retrieve saved paths from the workqueue.
67
+ Implementation-wise, \ref bmct passes a worklist to goto_symext in
68
+ bmct::do_language_agnostic_bmc(). If path exploration is enabled,
69
+ goto_symext will fill up the worklist whenever it encounters a branch,
70
+ instead of merging the paths on the branch. After the initial symbolic
71
+ execution (i.e. the first call to bmct::run() in
72
+ bmct::do_language_agnostic_bmc()), \ref bmct continues popping the
73
+ worklist and executing untaken paths until the worklist empties. Note
74
+ that this means that the default model-checking behaviour is a special
75
+ case of path exploration: when model-checking, the initial symbolic
76
+ execution run does not add any paths to the workqueue but rather merges
77
+ all the paths together, so the additional path-exploration loop is
78
+ skipped over.
79
+
80
+ \subsection ssa-renaming SSA renaming levels
81
+
82
+ In goto-programs, variable names get a prefix to indicate their scope
83
+ (like ` main::1::%foo ` or whatever). At symbolic execution level, variables
84
+ also get a _ suffix_ because we’re doing single-static assignment. There
85
+ are three “levels” of renaming. At Level 2, variables are renamed every
86
+ time they are encountered in a function. At Level 1, variables are
87
+ renamed every time the functions that contain those variables are
88
+ called. At Level 0, variables are renamed every time a new thread
89
+ containing those variables are spawned. We can inspect the renamed
90
+ variable names with the –show-vcc flag, which prints a string with the
91
+ following format: ` %%s!%%d@%%d#%%d ` . The string field is the variable name,
92
+ and the numbers after the !, @, and # are the L0, L1, and L2 suffixes
93
+ respectively. The following examples illustrate Level 1 and 2 renaming:
94
+
95
+ $ cat l1.c
96
+ int main() {
97
+ int x=7;
98
+ x=8;
99
+ assert(0);
100
+ }
101
+ $ cbmc --show-vcc l1.c
102
+ ...
103
+ {-12} x!0@1#2 == 7
104
+ {-13} x!0@1#3 == 8
105
+
106
+ That is, the L0 names for both xs are 0; the L1 names for both xs are 1;
107
+ but each occurrence of x within main() gets a fresh L2 suffix (2 and 3,
108
+ respectively).
109
+
110
+ $ cat l2.c
111
+ void foo(){
112
+ int x=7;
113
+ x=8;
114
+ x=9;
115
+ }
116
+ int main(){
117
+ foo();
118
+ foo();
119
+ assert(0);
120
+ }
121
+ $ cbmc --show-vcc l2.c
122
+ ...
123
+ {-12} x!0@1#2 == 7
124
+ {-13} x!0@1#3 == 8
125
+ {-14} x!0@1#4 == 9
126
+ {-15} x!0@2#2 == 7
127
+ {-16} x!0@2#3 == 8
128
+ {-17} x!0@2#4 == 9
129
+
130
+ That is, every time we enter function foo, the L1 counter of x is
131
+ incremented (from 1 to 2) and the L0 counter is reset (back to 2, after
132
+ having been incremented up to 4). The L0 counter then increases every
133
+ time we access x as we walk through the function.
134
+
22
135
---
23
136
\section counter-example-production Counter Example Production
24
137
0 commit comments