@@ -696,7 +696,7 @@ is to encode both the program and the set of states using an appropriate logic,
696
696
mostly * propositional logic* and (fragments of) * first-order logic* .
697
697
698
698
In the following, we will quickly discuss propositional logic, in combination
699
- with SAT solving, and show how to build a simple bounded model checker
699
+ with SAT solving, and show how to build a simple bounded model checker.
700
700
Actual bounded model checking for software requires
701
701
a number of additional steps and concepts, which will be introduced as required
702
702
later on.
@@ -775,16 +775,16 @@ Remember the `factorial` function. It starts with the line
775
775
``` C
776
776
unsigned long fac = 1 ;
777
777
```
778
- Now, from the C standard and the most common C ABIs, we know that internally,
779
- ` fac ` will be represented as a binary number with 64 bits . So, if we wish to
780
- reason about the contents of the variable ` fac ` , we might as well represent it
781
- as a vector of 64 propositional variables, say ` fac ` <sub >0</sub > to
778
+ Now, suppose that ` fac ` will be represented as a binary number with 64 bits (this
779
+ is standard on, e.g., Linux) . So,
780
+ if we wish to reason about the contents of the variable ` fac ` , we might as well
781
+ represent it as a vector of 64 propositional variables, say ` fac ` <sub >0</sub > to
782
782
` fac ` <sub >63</sub >, where
783
783
` fac ` = 2<sup >63</sup > ` fac ` <sub >63</sub > + ... + 2<sup >0</sup > ` fac ` <sub >0</sub >.
784
784
We can then assert that ` fac ` =1 using
785
785
the propositional formula
786
786
` fac ` <sub >63</sub > = 0 and ... and ` fac ` <sub >1</sub > = 0 and ` fac ` <sub >0</sub > = 1,
787
- where we define the formula A = B as ''(A and B) or (( not A ) and (not B) )''.
787
+ where we define the formula A = B as ''(A or not B ) and (B or not A )''.
788
788
789
789
We call this a * bit vector* representation. Compare the Wikipedia page on
790
790
[ Binary numbers] ( https://en.wikipedia.org/wiki/Binary_number ) .
@@ -823,7 +823,7 @@ numbers as
823
823
> and S_2 = FA_S(A_2,B_2,C_1) and S_3 = FA_S(A_3,B_3,C_2)
824
824
> and S_3 = FA_S(0,0,C_3).
825
825
826
- Other arithmetic operations on binary number can be expressed using propositional
826
+ Other arithmetic operations on binary numbers can be expressed using propositional
827
827
logic as well; the details can be found in the linked articles, as well
828
828
as [ Two's complement] ( https://en.wikipedia.org/wiki/Two%27s_complement ) for
829
829
handling signed integers and [ IEEE 754] ( https://en.wikipedia.org/wiki/IEEE_754 )
@@ -869,13 +869,13 @@ bit vector X, and for the return value, return. But we also have to deal
869
869
with the (local) variable ` y ` , which gets two assignments. Furthermore, we
870
870
now have a program with three instructions.
871
871
872
- With a bit of reflection , we find that we cannot directly translate
872
+ Thinking about the approach so far , we find that we cannot directly translate
873
873
` y=y+x ` into a propositional formula: On the left-hand side of the ` = ` , we
874
874
have the ''new'' value of ` y ` , while the right-hand side uses the ''old'' value.
875
875
But propositional formulas do not distinguish between ''old'' and ''new'' values,
876
876
so we need to find a way to distinguish between the two. The easiest solution
877
- is to use the Static Single Assignment form described above. We transform
878
- the program into SSA (slightly simplifying the notation):
877
+ is to use the Static Single Assignment form described in section \ref SSA_section.
878
+ We transform the program into SSA (slightly simplifying the notation):
879
879
``` C
880
880
int calculate (int x.1)
881
881
{
@@ -891,13 +891,11 @@ stand for `y.1` and `y.2` (we map X to `x.1` and return to the return
891
891
value). `int y.1 = x.1 * x.1` becomes Y1 = X * X, `y.2 = y.1 + x.1` becomes
892
892
Y2 = Y1 + X and `return y.2` becomes return = Y2.
893
893
894
- To tie the three formulas together into a description of the while program,
894
+ To tie the three formulas together into a description of the whole program,
895
895
we note that the three instructions form a single basic block, so we know they
896
- are always executed as a unit. In this case, it is sufficient to simple connect
897
- them with ''and'': Y1 = X * X and Y2 = Y1 + X and return = Y2. Note that this
898
- propositional formula does not actually describe the order of execution of the
899
- statements, but simply summarizes their outcomes! Once we have non-trivial
900
- control flow, we have to do some extra work in this model.
896
+ are always executed as a unit. In this case, it is sufficient to simply connect
897
+ them with ''and'': Y1 = X * X and Y2 = Y1 + X and return = Y2. Note that thanks
898
+ to SSA, we do not need to pay attention to control flow here.
901
899
902
900
One example of non-trivial control flow are `if` statements. Consider the
903
901
following example:
@@ -932,7 +930,7 @@ the correct value.
932
930
933
931
As a first step, we modify the SSA form slightly by introducing an additional
934
932
propositional variable C that tracks which branch of the `if` was taken.
935
- We call this variabel the *code guard variable*, or *guard* for short.
933
+ We call this variable the *code guard variable*, or *guard* for short.
936
934
Additionally, we add C to the Φ node as a new first parameter, describing
937
935
which input to use as a result.
938
936
The corresponding program looks something like this:
@@ -942,14 +940,14 @@ int max(int a, int b)
942
940
int result;
943
941
bit C; /* Track which branch was taken */
944
942
C = a < b;
945
- if (C)
943
+ /* if (C) - not needed anymore thanks to SSA */
946
944
result.1 = b;
947
- else
945
+ /* else */
948
946
result.2 = a;
949
947
return Phi(C,result.1,result.2);
950
948
}
951
949
```
952
- For the encoding of the program, we introduce a new propositional junctor,
950
+ For the encoding of the program, we introduce the implication junctor, written
953
951
&rArr ; , where ''A &rArr ; B'' is equivalent to ''(not A) or B''.
954
952
It can be understood as ''if A holds, B must hold as well''.
955
953
@@ -958,27 +956,56 @@ the basic statements of the program:
958
956
- ` C = a<b ` maps to C = A< ; B, for an appropriate formula A< ; B.
959
957
- ` result.1 = b ` becomes R1 = B, and ` result.2 = a ` becomes R2 = A.
960
958
961
- To handle the ` if ` statement, we simply make the execution of each branch
962
- conditional on C using the &rArr ; junctor:
963
- ``` C
964
- if (C)
965
- result.1 = b;
966
- else
967
- result.2 = a;
968
- ```
969
- becomes (C &rArr ; R1 = B) and ((not C) &rArr ; R2 = A), stating that
970
- the equation for the first assignment holds when C is true, and that for
971
- the second assignment holds when C is false.
972
-
973
959
Finally, the &Phi ; node is again resolved using the &rArr ; junctor: we
974
960
can encode the ` return ` statement as
975
961
(C &rArr ; return = R1) and ((not C) &rArr ; return = R2).
976
962
977
963
At this point, it remains to tie the statements together; we find that we can
978
- again simply connect them with ''and'', since the statements are always executed
979
- in sequence. We get:
980
- > C = a< ; b and (C &rArr ; R1 = B) and (C &rArr ; return = R1) and
981
- > ((not C) &rArr ; R2 = A) and ((not C) &rArr ; return = R2).
964
+ again simply connect them with ''and''. We get:
965
+ > C = a< ; b and R1 = B and R2 = A and
966
+ > (C &rArr ; return = R1) and ((not C) &rArr ; return = R2).
967
+
968
+ So far, we have only discussed how to encode the behavior of programs
969
+ as propositional formulas. To actually reason about programs, we also need to
970
+ a way to describe the property we want to prove. To do this, we introduce a
971
+ primitive ` ASSERT ` . Let ` e ` be some expression; then ` ASSERT(e) ` is supposed
972
+ to do nothing if ` e ` evaluates to true, and to abort the program if ` e `
973
+ evaluates to false.
974
+
975
+ For instance, we can add prove that ` max(a,b) <= a ` by modifying the ` max `
976
+ function into
977
+ ``` C
978
+ int max (int a, int b)
979
+ {
980
+ int result;
981
+ if (a < b)
982
+ result = b;
983
+ else
984
+ result = a;
985
+ ASSERT(result <= a);
986
+ return result;
987
+ }
988
+ ```
989
+
990
+ The corresponding SSA would be
991
+ ```C
992
+ int max(int a, int b)
993
+ {
994
+ int result;
995
+ bit C; /* Track which branch was taken */
996
+ C = a < b;
997
+ result.1 = b;
998
+ result.2 = a;
999
+ ASSERT(Phi(C,result.1,result.2) <= a);
1000
+ return Phi(C,result.1,result.2);
1001
+ }
1002
+ ```
1003
+ We translate ` ASSERT(Phi(C,result.1,result.2) <= a) ` into
1004
+ > &Phi ; (C,result.1,result.2) < ; = a
1005
+ The resulting formula would be
1006
+ > C = a< ; b and R1 = B and R2 = A and
1007
+ > (C &rArr ; R1 < ; = A) and ((not C) &rArr ; R2 < ; = A).
1008
+ > (C &rArr ; return = R1) and ((not C) &rArr ; return = R2).
982
1009
983
1010
We can extend this approach quite straightforwardly to other constructs, but
984
1011
one obvious problem remains: We have not described how to handle loops. This
@@ -991,7 +1018,7 @@ their behavior using a finite propositional formula in the way we have
991
1018
done above.
992
1019
993
1020
There are multiple approaches to deal with this problem, all with different
994
- trade-offs. CPROVER chooses bounded model checking as the underlying approach.
1021
+ trade-offs. CBMC chooses bounded model checking as the underlying approach.
995
1022
The idea is that we only consider program
996
1023
executions that are, in some measure, ''shorter'' than a given bound. This bound
997
1024
then implies an upper bound on the size of the required formulas, which makes
@@ -1031,12 +1058,14 @@ if (e) {
1031
1058
}
1032
1059
}
1033
1060
```
1034
- This transformation is known as ''loop unrolling'': We can always replace a loop
1035
- by an ` if ` that checks if we should execute, a copy of the loop body and the
1061
+ This transformation is known as [ loop unrolling] ( https://en.wikipedia.org/wiki/Loop_unrolling )
1062
+ or * unwinding* :
1063
+ We can always replace a loopby an
1064
+ ` if ` that checks if we should execute, a copy of the loop body and the
1036
1065
loop statement.
1037
1066
1038
1067
So, to reason about the ` factorial ` function, we unroll the loop three times
1039
- and then replace the loop with a special ` IGNORE ` statement . We get:
1068
+ and then replace the loop with ` ASSERT(!(condition)) ` . We get:
1040
1069
``` C
1041
1070
unsigned long factorial (unsigned n) {
1042
1071
unsigned long fac = 1;
@@ -1050,9 +1079,7 @@ unsigned long factorial(unsigned n) {
1050
1079
if (i <= n) {
1051
1080
fac * = i;
1052
1081
i = i+1;
1053
- if (i <= n) {
1054
- IGNORE;
1055
- }
1082
+ ASSERT(!(i <= n));
1056
1083
}
1057
1084
}
1058
1085
return fac;
@@ -1066,49 +1093,39 @@ unsigned long factorial(unsigned n) {
1066
1093
unsigned long fac.1 = 1;
1067
1094
unsigned int i.1 = 1;
1068
1095
bit C1 = i.1 <= n;
1069
- if (C1) {
1070
- fac.2 = fac.1 * i.1;
1071
- i.2 = i.1+1;
1072
- bit C2 = i.2 <= n;
1073
- if (C2) {
1074
- fac.3 = fac.2 * i.2;
1075
- i.3 = i.2+1;
1076
- bit C3 = i.3 <= n;
1077
- if (C3) {
1078
- fac.4 = fac.3 * i.3;
1079
- i.4 = i.3+1;
1080
- bit C4 = i.4 <= n;
1081
- if (C4) {
1082
- IGNORE;
1083
- }
1084
- }
1085
- }
1096
+ /* if (C1) { */
1097
+ fac.2 = fac.1 * i.1;
1098
+ i.2 = i.1+1;
1099
+ bit C2 = i.2 <= n;
1100
+ /* if (C2) { */
1101
+ fac.3 = fac.2 * i.2;
1102
+ i.3 = i.2+1;
1103
+ bit C3 = i.3 <= n;
1104
+ /* if (C3) { */
1105
+ fac.4 = fac.3 * i.3;
1106
+ i.4 = i.3+1;
1107
+ ASSERT(!(i.4 <= n));
1108
+ /* }}} */
1086
1109
return Phi(C1, Phi(C2, Phi(C3, fac.4, fac.3), fac.2), fac.1);
1087
1110
}
1088
1111
```
1089
- We translate ` IGNORE ` into the formula ** false** - this will later allow
1090
- us to rule out all paths that reach this point.
1112
+ Note that we may be missing
1113
+ possible executions of the program due to this translation; we come back to
1114
+ this point later.
1091
1115
1092
1116
The corresponding propositional formula can then be written as (check
1093
1117
that this is equivalent to the formula you would be getting by following
1094
1118
the translation procedure described above):
1095
1119
1096
1120
> fac.1 = 1 and i.1 = 1 and C1 = i.1 < ; = n and
1097
- > ((not C1) &rArr ; return = fac.1) and
1098
- > C1 &rArr ; (
1099
- >> fac.2 = fac.1 * i.1 and i.2 = i.1 + 1 and C2 = i.2 < ; = n and
1100
- >> ((not C2) &rArr ; return = fac.2) and
1101
- >> C2 &rArr ; (
1102
- >>> fac.3 = fac.2 * i.2 and i.3 = i.2 + 1 and C3 = i.3 <= n and
1103
- >>> ((not C3) ⇒ return = fac.3) and
1104
- >>> C3 ⇒ (
1105
- >>>> fac.4 = fac.3 * i.3 and i.4 = i.3 + 1 and C4 = i.4 <= n and
1106
- >>>> ((not C4) ⇒ return = fac.4) and
1107
- >>>> (C4 ⇒ false)
1108
- >>> )
1109
- >>> )
1110
- >> )
1111
-
1121
+ > fac.2 = fac.1 * i.1 and i.2 = i.1 + 1 and C2 = i.2 < ; = n and
1122
+ > fac.3 = fac.2 * i.2 and i.3 = i.2 + 1 and C3 = i.3 < ; = n and
1123
+ > fac.4 = fac.3 * i.3 and i.4 = i.3 + 1 and C4 = i.4 < ; = n and
1124
+ > not (i.4 <= n) and
1125
+ > ((C1 and C2 and C3) &rArr ; result = fac.4) and
1126
+ > ((C1 and C2 and not C3) &rArr ; result = fac.3) and
1127
+ > ((C1 and not C2) &rArr ; result = fac.2) and
1128
+ > ((not C1) &rArr ; result = fac.1)
1112
1129
In the following, we reference this formula as FA(n, result).
1113
1130
1114
1131
At this point, we know how to encode programs as propositional formulas.
@@ -1120,15 +1137,15 @@ If this formula has a model (i.e., if we can find a satisfying assignment to
1120
1137
all variables, and in particular, to n), we can extract the required value
1121
1138
for the parameter ` n ` from that model. As we have discussed above, this can
1122
1139
be done using a SAT solver: If you run, say, MiniSAT on this formula, you will
1123
- get a model involving n=3.
1140
+ get a model that translates to n=3.
1124
1141
1125
1142
Be aware that this method has very clear limits: We know that the factorial of
1126
1143
` 5 ` is ` 120 ` , but with the formula above, evaluating
1127
1144
''FA(n, result) and result=120'' would yield ''unsatisfiable''! This is because
1128
1145
we limited the number of loop iterations, and to reach 120, we have to execute
1129
- the loop more than three times.
1130
- That being said, for typical CPROVER use cases, we can often make do with a
1131
- reasonable bound on loop iterations .
1146
+ the loop more than three times. In particular, a ''VERIFICATION SUCCESSFUL''
1147
+ message, as output by CBMC, must always be interpreted keeping the bounds
1148
+ that were used in mind .
1132
1149
1133
1150
In the case that we found a model, we can get even more information: We can
1134
1151
even reconstruct the program execution that would lead to the requested result.
0 commit comments