Documentation: Address comments.

johanneskloos · johanneskloos · commit 3314ed3b1186 · 2018-11-14T13:39:53.000Z
diff --git a/doc/architectural/background-concepts.md b/doc/architectural/background-concepts.md
@@ -696,7 +696,7 @@ is to encode both the program and the set of states using an appropriate logic,
 mostly *propositional logic* and (fragments of) *first-order logic*.
 
 In the following, we will quickly discuss propositional logic, in combination
-with SAT solving, and show how to build a simple bounded model checker
+with SAT solving, and show how to build a simple bounded model checker.
 Actual bounded model checking for software requires
 a number of additional steps and concepts, which will be introduced as required
 later on.
@@ -775,16 +775,16 @@ Remember the `factorial` function. It starts with the line
 ```C
 unsigned long fac = 1;
 ```
-Now, from the C standard and the most common C ABIs, we know that internally,
-`fac` will be represented as a binary number with 64 bits. So, if we wish to
-reason about the contents of the variable `fac`, we might as well represent it
-as a vector of 64 propositional variables, say `fac`<sub>0</sub> to
+Now, suppose that `fac` will be represented as a binary number with 64 bits (this
+is standard on, e.g., Linux). So,
+if we wish to reason about the contents of the variable `fac`, we might as well
+represent it as a vector of 64 propositional variables, say `fac`<sub>0</sub> to
 `fac`<sub>63</sub>, where
 `fac` = 2<sup>63</sup> `fac`<sub>63</sub> + ... + 2<sup>0</sup> `fac`<sub>0</sub>.
 We can then assert that `fac`=1 using
 the propositional formula
 `fac`<sub>63</sub> = 0 and ... and `fac`<sub>1</sub> = 0 and `fac`<sub>0</sub> = 1,
-where we define the formula A = B as ''(A and B) or ((not A) and (not B))''.
+where we define the formula A = B as ''(A or not B) and (B or not A)''.
 
 We call this a *bit vector* representation. Compare the Wikipedia page on
 [Binary numbers](https://en.wikipedia.org/wiki/Binary_number).
@@ -823,7 +823,7 @@ numbers as
 > and S_2 = FA_S(A_2,B_2,C_1) and S_3 = FA_S(A_3,B_3,C_2)
 > and S_3 = FA_S(0,0,C_3).
 
-Other arithmetic operations on binary number can be expressed using propositional
+Other arithmetic operations on binary numbers can be expressed using propositional
 logic as well; the details can be found in the linked articles, as well
 as [Two's complement](https://en.wikipedia.org/wiki/Two%27s_complement) for
 handling signed integers and [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754)
@@ -869,13 +869,13 @@ bit vector X, and for the return value, return. But we also have to deal
 with the (local) variable `y`, which gets two assignments. Furthermore, we
 now have a program with three instructions.
 
-With a bit of reflection, we find that we cannot directly translate
+Thinking about the approach so far, we find that we cannot directly translate
 `y=y+x` into a propositional formula: On the left-hand side of the `=`, we
 have the ''new'' value of `y`, while the right-hand side uses the ''old'' value.
 But propositional formulas do not distinguish between ''old'' and ''new'' values,
 so we need to find a way to distinguish between the two. The easiest solution
-is to use the Static Single Assignment form described above. We transform
-the program into SSA (slightly simplifying the notation):
+is to use the Static Single Assignment form described in section \ref SSA_section.
+We transform the program into SSA (slightly simplifying the notation):
 ```C
 int calculate(int x.1)
 {
@@ -891,13 +891,11 @@ stand for `y.1` and `y.2` (we map X to `x.1` and return to the return
 value). `int y.1 = x.1 * x.1` becomes Y1 = X * X, `y.2 = y.1 + x.1` becomes
 Y2 = Y1 + X and `return y.2` becomes return = Y2.
 
-To tie the three formulas together into a description of the while program,
+To tie the three formulas together into a description of the whole program,
 we note that the three instructions form a single basic block, so we know they
-are always executed as a unit. In this case, it is sufficient to simple connect
-them with ''and'': Y1 = X * X and Y2 = Y1 + X and return = Y2. Note that this
-propositional formula does not actually describe the order of execution of the
-statements, but simply summarizes their outcomes! Once we have non-trivial
-control flow, we have to do some extra work in this model.
+are always executed as a unit. In this case, it is sufficient to simply connect
+them with ''and'': Y1 = X * X and Y2 = Y1 + X and return = Y2. Note that thanks
+to SSA, we do not need to pay attention to control flow here.
 
 One example of non-trivial control flow are `if` statements. Consider the
 following example:
@@ -932,7 +930,7 @@ the correct value.
 
 As a first step, we modify the SSA form slightly by introducing an additional
 propositional variable C that tracks which branch of the `if` was taken.
-We call this variabel the *code guard variable*, or *guard* for short.
+We call this variable the *code guard variable*, or *guard* for short.
 Additionally, we add C to the &Phi; node as a new first parameter, describing
 which input to use as a result.
 The corresponding program looks something like this:
@@ -942,14 +940,14 @@ int max(int a, int b)
   int result;
   bit C; /* Track which branch was taken */
   C = a < b;
-  if (C)
+  /* if (C) - not needed anymore thanks to SSA */
     result.1 = b;
-  else
+  /* else */
     result.2 = a;
   return Phi(C,result.1,result.2);
 }
 ```
-For the encoding of the program, we introduce a new propositional junctor,
+For the encoding of the program, we introduce the implication junctor, written
 &rArr;, where ''A &rArr; B'' is equivalent to ''(not A) or B''.
 It can be understood as ''if A holds, B must hold as well''.
 
@@ -958,27 +956,56 @@ the basic statements of the program:
 - `C = a<b` maps to C = A&lt;B, for an appropriate formula A&lt;B.
 - `result.1 = b` becomes R1 = B, and `result.2 = a` becomes R2 = A.
 
-To handle the `if` statement, we simply make the execution of each branch
-conditional on C using the &rArr; junctor:
-```C
-  if (C)
-    result.1 = b;
-  else
-    result.2 = a;
-```
-becomes (C &rArr; R1 = B) and ((not C) &rArr; R2 = A), stating that
-the equation for the first assignment holds when C is true, and that for
-the second assignment holds when C is false.
-
 Finally, the &Phi; node is again resolved using the &rArr; junctor: we
 can encode the `return` statement as
 (C &rArr; return = R1) and ((not C) &rArr; return = R2).
 
 At this point, it remains to tie the statements together; we find that we can
-again simply connect them with ''and'', since the statements are always executed
-in sequence. We get:
-> C = a&lt;b and (C &rArr; R1 = B) and (C &rArr; return = R1) and
-> ((not C) &rArr; R2 = A) and ((not C) &rArr; return = R2).
+again simply connect them with ''and''. We get:
+> C = a&lt;b and R1 = B and R2 = A and
+> (C &rArr; return = R1) and ((not C) &rArr; return = R2).
+
+So far, we have only discussed how to encode the behavior of programs
+as propositional formulas. To actually reason about programs, we also need to
+a way to describe the property we want to prove. To do this, we introduce a
+primitive `ASSERT`. Let `e` be some expression; then `ASSERT(e)` is supposed
+to do nothing if `e` evaluates to true, and to abort the program if `e`
+evaluates to false.
+
+For instance, we can add prove that `max(a,b) <= a` by modifying the `max`
+function into
+```C
+int max(int a, int b)
+{
+  int result;
+  if (a < b)
+    result = b;
+  else
+    result = a;
+  ASSERT(result <= a);
+  return result;
+}
+```
+
+The corresponding SSA would be
+```C
+int max(int a, int b)
+{
+  int result;
+  bit C; /* Track which branch was taken */
+  C = a < b;
+  result.1 = b;
+  result.2 = a;
+  ASSERT(Phi(C,result.1,result.2) <= a);
+  return Phi(C,result.1,result.2);
+}
+```
+We translate `ASSERT(Phi(C,result.1,result.2) <= a)` into
+> &Phi;(C,result.1,result.2) &lt;= a
+The resulting formula would be
+> C = a&lt;b and R1 = B and R2 = A and
+> (C &rArr; R1 &lt;= A) and ((not C) &rArr; R2 &lt;= A).
+> (C &rArr; return = R1) and ((not C) &rArr; return = R2).
 
 We can extend this approach quite straightforwardly to other constructs, but
 one obvious problem remains: We have not described how to handle loops. This
@@ -991,7 +1018,7 @@ their behavior using a finite propositional formula in the way we have
 done above.
 
 There are multiple approaches to deal with this problem, all with different
-trade-offs. CPROVER chooses bounded model checking as the underlying approach.
+trade-offs. CBMC chooses bounded model checking as the underlying approach.
 The idea is that we only consider program
 executions that are, in some measure, ''shorter'' than a given bound. This bound
 then implies an upper bound on the size of the required formulas, which makes
@@ -1031,12 +1058,14 @@ if (e) {
   }
 }
 ```
-This transformation is known as ''loop unrolling'': We can always replace a loop
-by an `if` that checks if we should execute, a copy of the loop body and the
+This transformation is known as [loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling)
+or *unwinding*:
+We can always replace a loopby an
+`if` that checks if we should execute, a copy of the loop body and the
 loop statement.
 
 So, to reason about the `factorial` function, we unroll the loop three times
-and then replace the loop with a special `IGNORE` statement. We get:
+and then replace the loop with `ASSERT(!(condition))`. We get:
 ```C
 unsigned long factorial(unsigned n) {
   unsigned long fac = 1;
@@ -1050,9 +1079,7 @@ unsigned long factorial(unsigned n) {
       if (i <= n) {
         fac *= i;
         i = i+1;
-        if (i <= n) {
-          IGNORE;
-        }
+        ASSERT(!(i <= n));
     }
   }
   return fac;
@@ -1066,49 +1093,39 @@ unsigned long factorial(unsigned n) {
   unsigned long fac.1 = 1;
   unsigned int i.1 = 1;
   bit C1 = i.1 <= n;
-  if (C1) {
-    fac.2 = fac.1 * i.1;
-    i.2 = i.1+1;
-    bit C2 = i.2 <= n;
-    if (C2) {
-      fac.3 = fac.2 * i.2;
-      i.3 = i.2+1;
-      bit C3 = i.3 <= n;
-      if (C3) {
-        fac.4 = fac.3 * i.3;
-        i.4 = i.3+1;
-        bit C4 = i.4 <= n;
-        if (C4) {
-          IGNORE;
-        }
-    }
-  }
+  /* if (C1) { */
+  fac.2 = fac.1 * i.1;
+  i.2 = i.1+1;
+  bit C2 = i.2 <= n;
+  /* if (C2) { */
+  fac.3 = fac.2 * i.2;
+  i.3 = i.2+1;
+  bit C3 = i.3 <= n;
+  /* if (C3) { */
+  fac.4 = fac.3 * i.3;
+  i.4 = i.3+1;
+  ASSERT(!(i.4 <= n));
+  /* }}} */
   return Phi(C1, Phi(C2, Phi(C3, fac.4, fac.3), fac.2), fac.1);
 }
 ```
-We translate `IGNORE` into the formula **false** - this will later allow
-us to rule out all paths that reach this point.
+Note that we may be missing
+possible executions of the program due to this translation; we come back to
+this point later.
 
 The corresponding propositional formula can then be written as (check
 that this is equivalent to the formula you would be getting by following
 the translation procedure described above):
 
 > fac.1 = 1 and i.1 = 1 and C1 = i.1 &lt;= n and
-> ((not C1) &rArr; return = fac.1) and
-> C1 &rArr; (
->>   fac.2 = fac.1 * i.1 and i.2 = i.1 + 1 and C2 = i.2 &lt;= n and
->>   ((not C2) &rArr; return = fac.2) and
->>   C2 &rArr; (
->>>     fac.3 = fac.2 * i.2 and i.3 = i.2 + 1 and C3 = i.3 &lt;= n and
->>>     ((not C3) &rArr; return = fac.3) and
->>>     C3 &rArr; (
->>>>       fac.4 = fac.3 * i.3 and i.4 = i.3 + 1 and C4 = i.4 &lt;= n and
->>>>       ((not C4) &rArr; return = fac.4) and
->>>>       (C4 &rArr; false)
->>>     )
->>>   )
->> )
-
+> fac.2 = fac.1 * i.1 and i.2 = i.1 + 1 and C2 = i.2 &lt;= n and
+> fac.3 = fac.2 * i.2 and i.3 = i.2 + 1 and C3 = i.3 &lt;= n and
+> fac.4 = fac.3 * i.3 and i.4 = i.3 + 1 and C4 = i.4 &lt;= n and
+> not (i.4 <= n) and
+> ((C1 and C2 and C3) &rArr; result = fac.4) and
+> ((C1 and C2 and not C3) &rArr; result = fac.3) and
+> ((C1 and not C2) &rArr; result = fac.2) and
+> ((not C1) &rArr; result = fac.1)
 In the following, we reference this formula as FA(n, result).
 
 At this point, we know how to encode programs as propositional formulas.
@@ -1120,15 +1137,15 @@ If this formula has a model (i.e., if we can find a satisfying assignment to
 all variables, and in particular, to n), we can extract the required value
 for the parameter `n` from that model. As we have discussed above, this can
 be done using a SAT solver: If you run, say, MiniSAT on this formula, you will
-get a model involving n=3.
+get a model that translates to n=3.
 
 Be aware that this method has very clear limits: We know that the factorial of
 `5` is `120`, but with the formula above, evaluating
 ''FA(n, result) and result=120'' would yield ''unsatisfiable''! This is because
 we limited the number of loop iterations, and to reach 120, we have to execute
-the loop more than three times.
-That being said, for typical CPROVER use cases, we can often make do with a
-reasonable bound on loop iterations.
+the loop more than three times. In particular, a ''VERIFICATION SUCCESSFUL''
+message, as output by CBMC, must always be interpreted keeping the bounds
+that were used in mind.
 
 In the case that we found a model, we can get even more information: We can
 even reconstruct the program execution that would lead to the requested result.