Skip to content

Commit a812ba2

Browse files
committed
Improve goto-symex documentation
1 parent dbd6988 commit a812ba2

File tree

1 file changed

+66
-24
lines changed

1 file changed

+66
-24
lines changed

src/goto-symex/README.md

Lines changed: 66 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,12 @@
77

88
This directory contains a symbolic evaluation system for goto-programs.
99
This takes a goto-program and translates it to an equation system by
10-
traversing the program, branching and merging and unwinding loops as
11-
needed. Each reverse goto has a separate counter (the actual counting is
12-
handled by `cbmc`, see the `–unwind` and `–unwind-set` options). When a
13-
counter limit is reach, an assertion can be added to explicitly show
14-
when analysis is incomplete. The symbolic execution includes constant
15-
folding so loops that have a constant number of iterations will be
16-
handled completely (assuming the unwinding limit is sufficient).
10+
traversing the program, branching and merging and unwinding loops and recursion
11+
as needed.
1712

1813
The output of the symbolic execution is a system of equations; an object
19-
containing a list of `symex_target_elements`, each of which are
20-
equalities between `expr` expressions. See `symex_target_equation.h`.
14+
containing a list of \ref symex_target_equationt::SSA_stept structures, each of
15+
which are equalities between \ref exprt expressions.
2116
The output is in static, single assignment (SSA) form, which is *not*
2217
the case for goto-programs.
2318

@@ -50,44 +45,89 @@ The \ref bmct creates a goto_symext to symbolically execute the
5045
goto-program, thereby filling the equation, which can then be passed
5146
along to the SAT solver.
5247

53-
The symbolic execution state is maintained by goto_symex_statet. This is
54-
a memory-heavy data structure, so goto_symext creates it on-demand and
55-
lets it go out-of-scope as soon as possible. However, the process of
56-
symbolic execution fills out an additional \ref symbol_tablet
48+
The class \ref goto_symext holds the global state of the symbol executor, while
49+
\ref goto_symex_statet holds the program state at a particular point in
50+
symbolic execution. In straight-line code a single \ref goto_symex_statet is
51+
maintained, being transformed by each instruction in turn, but these state
52+
objects are cloned and merged at control-flow forks and joins respectively.
53+
\ref goto_symex_statet contains an instance of \ref value_sett, used to track
54+
what objects pointers may refer to, and a constant propagator domain used to
55+
try to simplify assignments and (more usefully) resolve branch instructions.
56+
57+
This is a memory-heavy data structure, so goto_symext creates it on-demand and
58+
lets it go out-of-scope as soon as possible.
59+
60+
The process of
61+
symbolic execution generates an additional \ref symbol_tablet
5762
"symbol table" of dynamically-created objects; this symbol table is
5863
needed when solving the equation. This symbol table must thus be
5964
exported out of the state before it is torn down; this is done through
6065
the parameter "`new_symbol_table`" passed as a reference to the various
6166
functions in goto_symext.
6267

63-
The main symbolic execution loop code is goto_symext::symex_step(). This
68+
The main symbolic execution loop code is \ref goto_symext::symex_step. This
6469
function case-switches over the type of the instruction that we're
6570
currently executing, and calls various other functions appropriate to
6671
the instruction type, i.e. goto_symext::symex_function_call() if the
6772
current instruction is a function call, goto_symext::symex_goto() if the
6873
current instruction is a goto, etc.
6974

75+
\subsection Loop and recursion unwinding
76+
77+
Each backwards goto and recursive call has a separate counter
78+
(handled by \ref cbmc or another driver program, see the `--unwind` and
79+
`--unwind-set` options). The symbolic execution includes constant
80+
folding so loops that have a constant / bounded number of iterations will often
81+
be handled completely (assuming the unwinding limit is sufficient).
82+
When an unwind or recursion limit is reached, an assertion can be added to
83+
explicitly show when analysis is incomplete.
84+
85+
\subsection \ref goto_symext::clean_expr
86+
87+
Before any expression is incorporated into the output equation set it is passed
88+
through \ref goto_symext::clean_expr and thus \ref goto_symext::dereference,
89+
whose primary purpose is to remove dereference operations. It achieves this
90+
using the \ref value_sett stored in \ref goto_symex_statet, replacing `*x` with
91+
a construct such as
92+
`x == &candidate1 ? candidate1 : x == &candidate2 ? candidate2 : x$object`
93+
94+
Note the `x$object` fallback candidate, which is known as a *failed symbol* or
95+
*failed object*, and represents the unknown object pointed to by `x` when
96+
neither of the candidates (`candidate1` and `candidate2` here) matched as
97+
expected. This is of course unsound, since `x$object` and `y$object` are always
98+
distinct, even if `x` and `y` might actually alias, but at least it ensures
99+
writes to and subsequent reads from `x` are related.
100+
101+
\ref goto_symext::dereference function also passes its argument to
102+
\ref goto_symex_statet::rename, which is responsible for both SSA
103+
renaming symbols and for applying constant propagation where possible. Renaming
104+
is also performed elsewhere without calling \ref goto_symext::dereference when
105+
an expression is already known to be pointer-free.
106+
70107
\subsection symex-path Path exploration
71108

72109
By default, CBMC symbolically executes the entire program, building up
73110
an equation representing all instructions, which it then passes to the
74111
solver. Notably, it _merges_ paths that diverge due to a goto
75112
instruction, forming a constraint that represents having taken either of
76-
the two paths; the code for doing this is goto_symext::merge_goto().
77-
78-
CBMC can operate in a different mode when the `--paths` flag is passed
79-
on the command line. This disables path merging; instead, CBMC executes
80-
a single path at a time, calling the solver with the equation
113+
the two paths; the code for doing this is \ref goto_symext::merge_goto.
114+
115+
Goto-symex can operate in a different mode when the `--paths` flag is passed
116+
in the \ref optionst object passed to its constructor (\ref cbmc passes this
117+
from the corresponding command-line option).
118+
This disables path merging; instead, symex executes
119+
a single path at a time, returning each one to its caller, which in the case of
120+
cbmc then calls its solver with the equation
81121
representing that path, then continues to execute other paths.
82122
The 'other paths' that would have been taken when the program branches
83-
are saved onto a workqueue so that CBMC can continue to execute the
84-
current path, and then later retrieve saved paths from the workqueue.
123+
are saved onto a workqueue so that the driver program can continue to execute
124+
the current path, and then later retrieve saved paths from the workqueue.
85125
Implementation-wise, \ref bmct passes a worklist to goto_symext in
86-
bmct::do_language_agnostic_bmc(). If path exploration is enabled,
126+
\ref bmct::do_language_agnostic_bmc. If path exploration is enabled,
87127
goto_symext will fill up the worklist whenever it encounters a branch,
88128
instead of merging the paths on the branch. After the initial symbolic
89-
execution (i.e. the first call to bmct::run() in
90-
bmct::do_language_agnostic_bmc()), \ref bmct continues popping the
129+
execution (i.e. the first call to \ref bmct::run in
130+
\ref bmct::do_language_agnostic_bmc), \ref bmct continues popping the
91131
worklist and executing untaken paths until the worklist empties. Note
92132
that this means that the default model-checking behaviour is a special
93133
case of path exploration: when model-checking, the initial symbolic
@@ -155,6 +195,8 @@ time we access x as we walk through the function.
155195

156196
In the \ref goto-symex directory.
157197

198+
199+
158200
**Key classes:**
159201
* symex_target_equationt
160202
* prop_convt

0 commit comments

Comments
 (0)