Skip to content

Commit 3314ed3

Browse files
committed
Documentation: Address comments.
1 parent 3ec7002 commit 3314ed3

File tree

1 file changed

+98
-81
lines changed

1 file changed

+98
-81
lines changed

doc/architectural/background-concepts.md

Lines changed: 98 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -696,7 +696,7 @@ is to encode both the program and the set of states using an appropriate logic,
696696
mostly *propositional logic* and (fragments of) *first-order logic*.
697697

698698
In the following, we will quickly discuss propositional logic, in combination
699-
with SAT solving, and show how to build a simple bounded model checker
699+
with SAT solving, and show how to build a simple bounded model checker.
700700
Actual bounded model checking for software requires
701701
a number of additional steps and concepts, which will be introduced as required
702702
later on.
@@ -775,16 +775,16 @@ Remember the `factorial` function. It starts with the line
775775
```C
776776
unsigned long fac = 1;
777777
```
778-
Now, from the C standard and the most common C ABIs, we know that internally,
779-
`fac` will be represented as a binary number with 64 bits. So, if we wish to
780-
reason about the contents of the variable `fac`, we might as well represent it
781-
as a vector of 64 propositional variables, say `fac`<sub>0</sub> to
778+
Now, suppose that `fac` will be represented as a binary number with 64 bits (this
779+
is standard on, e.g., Linux). So,
780+
if we wish to reason about the contents of the variable `fac`, we might as well
781+
represent it as a vector of 64 propositional variables, say `fac`<sub>0</sub> to
782782
`fac`<sub>63</sub>, where
783783
`fac` = 2<sup>63</sup> `fac`<sub>63</sub> + ... + 2<sup>0</sup> `fac`<sub>0</sub>.
784784
We can then assert that `fac`=1 using
785785
the propositional formula
786786
`fac`<sub>63</sub> = 0 and ... and `fac`<sub>1</sub> = 0 and `fac`<sub>0</sub> = 1,
787-
where we define the formula A = B as ''(A and B) or ((not A) and (not B))''.
787+
where we define the formula A = B as ''(A or not B) and (B or not A)''.
788788

789789
We call this a *bit vector* representation. Compare the Wikipedia page on
790790
[Binary numbers](https://en.wikipedia.org/wiki/Binary_number).
@@ -823,7 +823,7 @@ numbers as
823823
> and S_2 = FA_S(A_2,B_2,C_1) and S_3 = FA_S(A_3,B_3,C_2)
824824
> and S_3 = FA_S(0,0,C_3).
825825
826-
Other arithmetic operations on binary number can be expressed using propositional
826+
Other arithmetic operations on binary numbers can be expressed using propositional
827827
logic as well; the details can be found in the linked articles, as well
828828
as [Two's complement](https://en.wikipedia.org/wiki/Two%27s_complement) for
829829
handling signed integers and [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754)
@@ -869,13 +869,13 @@ bit vector X, and for the return value, return. But we also have to deal
869869
with the (local) variable `y`, which gets two assignments. Furthermore, we
870870
now have a program with three instructions.
871871

872-
With a bit of reflection, we find that we cannot directly translate
872+
Thinking about the approach so far, we find that we cannot directly translate
873873
`y=y+x` into a propositional formula: On the left-hand side of the `=`, we
874874
have the ''new'' value of `y`, while the right-hand side uses the ''old'' value.
875875
But propositional formulas do not distinguish between ''old'' and ''new'' values,
876876
so we need to find a way to distinguish between the two. The easiest solution
877-
is to use the Static Single Assignment form described above. We transform
878-
the program into SSA (slightly simplifying the notation):
877+
is to use the Static Single Assignment form described in section \ref SSA_section.
878+
We transform the program into SSA (slightly simplifying the notation):
879879
```C
880880
int calculate(int x.1)
881881
{
@@ -891,13 +891,11 @@ stand for `y.1` and `y.2` (we map X to `x.1` and return to the return
891891
value). `int y.1 = x.1 * x.1` becomes Y1 = X * X, `y.2 = y.1 + x.1` becomes
892892
Y2 = Y1 + X and `return y.2` becomes return = Y2.
893893
894-
To tie the three formulas together into a description of the while program,
894+
To tie the three formulas together into a description of the whole program,
895895
we note that the three instructions form a single basic block, so we know they
896-
are always executed as a unit. In this case, it is sufficient to simple connect
897-
them with ''and'': Y1 = X * X and Y2 = Y1 + X and return = Y2. Note that this
898-
propositional formula does not actually describe the order of execution of the
899-
statements, but simply summarizes their outcomes! Once we have non-trivial
900-
control flow, we have to do some extra work in this model.
896+
are always executed as a unit. In this case, it is sufficient to simply connect
897+
them with ''and'': Y1 = X * X and Y2 = Y1 + X and return = Y2. Note that thanks
898+
to SSA, we do not need to pay attention to control flow here.
901899
902900
One example of non-trivial control flow are `if` statements. Consider the
903901
following example:
@@ -932,7 +930,7 @@ the correct value.
932930
933931
As a first step, we modify the SSA form slightly by introducing an additional
934932
propositional variable C that tracks which branch of the `if` was taken.
935-
We call this variabel the *code guard variable*, or *guard* for short.
933+
We call this variable the *code guard variable*, or *guard* for short.
936934
Additionally, we add C to the &Phi; node as a new first parameter, describing
937935
which input to use as a result.
938936
The corresponding program looks something like this:
@@ -942,14 +940,14 @@ int max(int a, int b)
942940
int result;
943941
bit C; /* Track which branch was taken */
944942
C = a < b;
945-
if (C)
943+
/* if (C) - not needed anymore thanks to SSA */
946944
result.1 = b;
947-
else
945+
/* else */
948946
result.2 = a;
949947
return Phi(C,result.1,result.2);
950948
}
951949
```
952-
For the encoding of the program, we introduce a new propositional junctor,
950+
For the encoding of the program, we introduce the implication junctor, written
953951
&rArr;, where ''A &rArr; B'' is equivalent to ''(not A) or B''.
954952
It can be understood as ''if A holds, B must hold as well''.
955953

@@ -958,27 +956,56 @@ the basic statements of the program:
958956
- `C = a<b` maps to C = A&lt;B, for an appropriate formula A&lt;B.
959957
- `result.1 = b` becomes R1 = B, and `result.2 = a` becomes R2 = A.
960958

961-
To handle the `if` statement, we simply make the execution of each branch
962-
conditional on C using the &rArr; junctor:
963-
```C
964-
if (C)
965-
result.1 = b;
966-
else
967-
result.2 = a;
968-
```
969-
becomes (C &rArr; R1 = B) and ((not C) &rArr; R2 = A), stating that
970-
the equation for the first assignment holds when C is true, and that for
971-
the second assignment holds when C is false.
972-
973959
Finally, the &Phi; node is again resolved using the &rArr; junctor: we
974960
can encode the `return` statement as
975961
(C &rArr; return = R1) and ((not C) &rArr; return = R2).
976962

977963
At this point, it remains to tie the statements together; we find that we can
978-
again simply connect them with ''and'', since the statements are always executed
979-
in sequence. We get:
980-
> C = a&lt;b and (C &rArr; R1 = B) and (C &rArr; return = R1) and
981-
> ((not C) &rArr; R2 = A) and ((not C) &rArr; return = R2).
964+
again simply connect them with ''and''. We get:
965+
> C = a&lt;b and R1 = B and R2 = A and
966+
> (C &rArr; return = R1) and ((not C) &rArr; return = R2).
967+
968+
So far, we have only discussed how to encode the behavior of programs
969+
as propositional formulas. To actually reason about programs, we also need to
970+
a way to describe the property we want to prove. To do this, we introduce a
971+
primitive `ASSERT`. Let `e` be some expression; then `ASSERT(e)` is supposed
972+
to do nothing if `e` evaluates to true, and to abort the program if `e`
973+
evaluates to false.
974+
975+
For instance, we can add prove that `max(a,b) <= a` by modifying the `max`
976+
function into
977+
```C
978+
int max(int a, int b)
979+
{
980+
int result;
981+
if (a < b)
982+
result = b;
983+
else
984+
result = a;
985+
ASSERT(result <= a);
986+
return result;
987+
}
988+
```
989+
990+
The corresponding SSA would be
991+
```C
992+
int max(int a, int b)
993+
{
994+
int result;
995+
bit C; /* Track which branch was taken */
996+
C = a < b;
997+
result.1 = b;
998+
result.2 = a;
999+
ASSERT(Phi(C,result.1,result.2) <= a);
1000+
return Phi(C,result.1,result.2);
1001+
}
1002+
```
1003+
We translate `ASSERT(Phi(C,result.1,result.2) <= a)` into
1004+
> &Phi;(C,result.1,result.2) &lt;= a
1005+
The resulting formula would be
1006+
> C = a&lt;b and R1 = B and R2 = A and
1007+
> (C &rArr; R1 &lt;= A) and ((not C) &rArr; R2 &lt;= A).
1008+
> (C &rArr; return = R1) and ((not C) &rArr; return = R2).
9821009
9831010
We can extend this approach quite straightforwardly to other constructs, but
9841011
one obvious problem remains: We have not described how to handle loops. This
@@ -991,7 +1018,7 @@ their behavior using a finite propositional formula in the way we have
9911018
done above.
9921019

9931020
There are multiple approaches to deal with this problem, all with different
994-
trade-offs. CPROVER chooses bounded model checking as the underlying approach.
1021+
trade-offs. CBMC chooses bounded model checking as the underlying approach.
9951022
The idea is that we only consider program
9961023
executions that are, in some measure, ''shorter'' than a given bound. This bound
9971024
then implies an upper bound on the size of the required formulas, which makes
@@ -1031,12 +1058,14 @@ if (e) {
10311058
}
10321059
}
10331060
```
1034-
This transformation is known as ''loop unrolling'': We can always replace a loop
1035-
by an `if` that checks if we should execute, a copy of the loop body and the
1061+
This transformation is known as [loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling)
1062+
or *unwinding*:
1063+
We can always replace a loopby an
1064+
`if` that checks if we should execute, a copy of the loop body and the
10361065
loop statement.
10371066

10381067
So, to reason about the `factorial` function, we unroll the loop three times
1039-
and then replace the loop with a special `IGNORE` statement. We get:
1068+
and then replace the loop with `ASSERT(!(condition))`. We get:
10401069
```C
10411070
unsigned long factorial(unsigned n) {
10421071
unsigned long fac = 1;
@@ -1050,9 +1079,7 @@ unsigned long factorial(unsigned n) {
10501079
if (i <= n) {
10511080
fac *= i;
10521081
i = i+1;
1053-
if (i <= n) {
1054-
IGNORE;
1055-
}
1082+
ASSERT(!(i <= n));
10561083
}
10571084
}
10581085
return fac;
@@ -1066,49 +1093,39 @@ unsigned long factorial(unsigned n) {
10661093
unsigned long fac.1 = 1;
10671094
unsigned int i.1 = 1;
10681095
bit C1 = i.1 <= n;
1069-
if (C1) {
1070-
fac.2 = fac.1 * i.1;
1071-
i.2 = i.1+1;
1072-
bit C2 = i.2 <= n;
1073-
if (C2) {
1074-
fac.3 = fac.2 * i.2;
1075-
i.3 = i.2+1;
1076-
bit C3 = i.3 <= n;
1077-
if (C3) {
1078-
fac.4 = fac.3 * i.3;
1079-
i.4 = i.3+1;
1080-
bit C4 = i.4 <= n;
1081-
if (C4) {
1082-
IGNORE;
1083-
}
1084-
}
1085-
}
1096+
/* if (C1) { */
1097+
fac.2 = fac.1 * i.1;
1098+
i.2 = i.1+1;
1099+
bit C2 = i.2 <= n;
1100+
/* if (C2) { */
1101+
fac.3 = fac.2 * i.2;
1102+
i.3 = i.2+1;
1103+
bit C3 = i.3 <= n;
1104+
/* if (C3) { */
1105+
fac.4 = fac.3 * i.3;
1106+
i.4 = i.3+1;
1107+
ASSERT(!(i.4 <= n));
1108+
/* }}} */
10861109
return Phi(C1, Phi(C2, Phi(C3, fac.4, fac.3), fac.2), fac.1);
10871110
}
10881111
```
1089-
We translate `IGNORE` into the formula **false** - this will later allow
1090-
us to rule out all paths that reach this point.
1112+
Note that we may be missing
1113+
possible executions of the program due to this translation; we come back to
1114+
this point later.
10911115

10921116
The corresponding propositional formula can then be written as (check
10931117
that this is equivalent to the formula you would be getting by following
10941118
the translation procedure described above):
10951119

10961120
> fac.1 = 1 and i.1 = 1 and C1 = i.1 &lt;= n and
1097-
> ((not C1) &rArr; return = fac.1) and
1098-
> C1 &rArr; (
1099-
>> fac.2 = fac.1 * i.1 and i.2 = i.1 + 1 and C2 = i.2 &lt;= n and
1100-
>> ((not C2) &rArr; return = fac.2) and
1101-
>> C2 &rArr; (
1102-
>>> fac.3 = fac.2 * i.2 and i.3 = i.2 + 1 and C3 = i.3 &lt;= n and
1103-
>>> ((not C3) &rArr; return = fac.3) and
1104-
>>> C3 &rArr; (
1105-
>>>> fac.4 = fac.3 * i.3 and i.4 = i.3 + 1 and C4 = i.4 &lt;= n and
1106-
>>>> ((not C4) &rArr; return = fac.4) and
1107-
>>>> (C4 &rArr; false)
1108-
>>> )
1109-
>>> )
1110-
>> )
1111-
1121+
> fac.2 = fac.1 * i.1 and i.2 = i.1 + 1 and C2 = i.2 &lt;= n and
1122+
> fac.3 = fac.2 * i.2 and i.3 = i.2 + 1 and C3 = i.3 &lt;= n and
1123+
> fac.4 = fac.3 * i.3 and i.4 = i.3 + 1 and C4 = i.4 &lt;= n and
1124+
> not (i.4 <= n) and
1125+
> ((C1 and C2 and C3) &rArr; result = fac.4) and
1126+
> ((C1 and C2 and not C3) &rArr; result = fac.3) and
1127+
> ((C1 and not C2) &rArr; result = fac.2) and
1128+
> ((not C1) &rArr; result = fac.1)
11121129
In the following, we reference this formula as FA(n, result).
11131130

11141131
At this point, we know how to encode programs as propositional formulas.
@@ -1120,15 +1137,15 @@ If this formula has a model (i.e., if we can find a satisfying assignment to
11201137
all variables, and in particular, to n), we can extract the required value
11211138
for the parameter `n` from that model. As we have discussed above, this can
11221139
be done using a SAT solver: If you run, say, MiniSAT on this formula, you will
1123-
get a model involving n=3.
1140+
get a model that translates to n=3.
11241141

11251142
Be aware that this method has very clear limits: We know that the factorial of
11261143
`5` is `120`, but with the formula above, evaluating
11271144
''FA(n, result) and result=120'' would yield ''unsatisfiable''! This is because
11281145
we limited the number of loop iterations, and to reach 120, we have to execute
1129-
the loop more than three times.
1130-
That being said, for typical CPROVER use cases, we can often make do with a
1131-
reasonable bound on loop iterations.
1146+
the loop more than three times. In particular, a ''VERIFICATION SUCCESSFUL''
1147+
message, as output by CBMC, must always be interpreted keeping the bounds
1148+
that were used in mind.
11321149

11331150
In the case that we found a model, we can get even more information: We can
11341151
even reconstruct the program execution that would lead to the requested result.

0 commit comments

Comments
 (0)