Merge pull request #4854 from danpoe/doc/update-memory-snapshot-harness-documentation

danpoe · web-flow · commit 069e75e237da · 2019-07-02T12:46:02.000+01:00
Update memory snapshot harness documentation
diff --git a/doc/cprover-manual/goto-harness.md b/doc/cprover-manual/goto-harness.md
@@ -314,87 +314,121 @@ VERIFICATION SUCCESSFUL
 
 ### The memory snapshot harness
 
-The `function-call` harness is used in situations in which we want the analysed
-function to work in arbitrary environment. If we want to analyse a function
-starting from a _real_ program state, we can call the `memory-snapshot` harness
-instead.
-
-Furthermore, the program state of interest may be taken at a particular location
-within a function. In that case we do not want the harness to instrument the
-whole function but rather to allow starting the execution from a specific
-initial location (specified via `--initial-location func[:<n>]`). Note that the
-initial location does not have to be the first instruction of a function: we can
-also specify the _location number_ `n` to set the initial location inside our
-function. The _location numbers_ do not have to coincide with the lines of the
-program code. To find the _location number_ run CBMC with
-`--show-goto-functions`. Most commonly, the _location number_ is the instruction
-of the break-point used to extract the program state for the memory snapshot.
+The `function-call` harness is used in situations in which we want to analyze a
+function in an arbitrary environment. If we want to analyze a function starting
+from a _real_ program state, we can use the `memory-snapshot` harness instead.
+
+The snapshot of the program state of interest may be taken at a particular
+program location within a function (using the `memory-analyzer` tool). In that
+case we want to generate a harness that behaves as if execution starts at a
+particular program location. The initial program location can be specified via
+the options `--initial-goto-location <function>[:<location-number>]` or
+`--initial-source-location <file>:<line-number>`.
 
 Say we want to check the assertion in the following code:
 
 ```C
 // main.c
 #include <assert.h>
+#include <stdlib.h>
 
-unsigned int x;
-unsigned int y;
-
-unsigned int nondet_int() {
-  unsigned int z;
-  return z;
-}
+int x;
+int y;
+int z;
 
-void checkpoint() {}
+// complex function which returns 1
+int get_one()
+{
+  int i;
 
-unsigned int complex_function_which_returns_one() {
-  unsigned int i = 0;
-  while(++i < 1000001) {
-    if(nondet_int() && ((i & 1) == 1))
+  for(i = 0; i < 100001; i++)
+  {
+    if(rand() && ((i & 1) == 1))
       break;
   }
+
   return i & 1;
 }
 
-void fill_array(unsigned int* arr, unsigned int size) {
-  for (unsigned int i = 0; i < size; i++)
-    arr[i]=nondet_int();
+// return a random value (!= 0)
+int get_random_value()
+{
+  int r;
+  while((r = rand()) == 0) {}
+  return r;
 }
 
-unsigned int array_sum(unsigned int* arr, unsigned int size) {
-  unsigned int sum = 0;
-  for (unsigned int i = 0; i < size; i++)
-    sum += arr[i];
-  return sum;
+int clip(int i)
+{
+  if(i > 99)
+  {
+    i = 99;
+  }
+
+  return i;
 }
 
-const unsigned int array_size = 100000;
+int main()
+{
+  x = get_random_value();
+  y = get_one();
+
+  // snapshot taken here (line 46)
+
+  z = clip(x);
+
+  assert(y + z <= 100);
 
-int main() {
-  x = complex_function_which_returns_one();
-  unsigned int large_array[array_size];
-  fill_array(large_array, array_size);
-  y = array_sum(large_array, array_size);
-  checkpoint();
-  assert(y + 2 > x);
   return 0;
 }
 ```
 
-But are not particularly interested in analysing the complex function, since we
-trust that its implementation is correct. Hence we run the above program
-stopping after the assignments to `x` and `x` and storing the program state,
-e.g. using the `memory-analyzer`, in a JSON file `snapshot.json`. Then run the
-harness and verify the assertion with:
+Assume we are interested in the code represented by the `clip()` function and
+its effect on the assertion below. To that end, we want to take a memory
+snapshot after the calls to `get_random_value()` and `get_one()`.
+
+In order to take the snapshot with `memory-analyzer`, we need to first compile
+the program which `goto-gcc`, which produces a binary containing both native
+machine code and the corresponding goto program:
+
+```sh
+$ goto-gcc -g -o main.gb main.c
+```
+
+Then we can execute the program with `memory-analyzer` and take a snapshot at
+the specified breakpoint. The variables to be included in the snapshot need to
+be specified via the `--symbols` option.
 
+```sh
+$ memory-analyzer \
+  --breakpoint 46 \
+  --symbols 'x, y, z' \
+  --symtab-snapshot \
+  --json-ui \
+  main.gb \
+  > snapshot.json
 ```
+
+We then generate a harness with `goto-harness` that behaves as if execution
+started from the state given by the memory snapshot at the specified program
+location. We further overapproximate the value returned by `get_random_value()`
+by havocking the variable `x`.
+
+```sh
 $ goto-cc -o main.gb main.c
+
 $ goto-harness \
   --harness-function-name harness \
   --harness-type initialise-with-memory-snapshot \
   --memory-snapshot snapshot.json \
-  --initial-location checkpoint \
+  --initial-source-location main.c:46 \
   --havoc-variables x \
   main.gb main-mod.gb
+```
+
+We can now verify the resulting goto program with `cbmc`:
+
+```sh
 $ cbmc --function harness main-mod.gb
 ```
 
@@ -405,7 +439,7 @@ This will result in:
 
 ** Results:
 main.c function main
-[main.assertion.1] line 42 assertion y + 2 > x: SUCCESS
+[main.assertion.1] line 50 assertion y + z <= 100: SUCCESS
 
 ** 0 of 1 failed (1 iterations)
 VERIFICATION SUCCESSFUL
diff --git a/doc/cprover-manual/memory-analyzer.md b/doc/cprover-manual/memory-analyzer.md
@@ -2,40 +2,36 @@
 
 ## Memory Analyzer
 
-The memory analyzer is a front-end for running and querying GDB in order to
-obtain a state of the input program. The GDB is not needed to be executed
-directly but is rather used as a back-end for the memory analysis. A common
-application would be to obtain a snapshot of a program under analysis at a
-particular state of execution. Such a snapshot could be useful on its own: to
-query about values of particular variables. Furthermore, since that snapshot is
-a state of a valid concrete execution, it can also be used for subsequent
-analyses.
+The memory analyzer is a front-end that runs and queries GDB in order to obtain
+a snapshot of the state of the input program at a particular program location.
+Such a snapshot could be useful on its own: to check the values of variables at
+a particular program location. Furthermore, since the snapshot is a state of a
+valid concrete execution, it can also be used for subsequent analyses.
 
 ## Usage
 
-We assume that the user wants to inspect a binary executable compiled with
-debugging symbols and a symbol table information understandable by CBMC, e.g.
-(having `goto-gcc` on the `PATH`):
+In order to analyze a program with `memory-analyzer` it needs to be compiled
+with `goto-gcc` (assuming `goto-gcc` is on the `PATH`):
 
 ```sh
 $ goto-gcc -g input_program.c -o input_program_exe
 ```
 
-Calling `goto-gcc` instead of simply compiling with `gcc` produces an ELF binary
-with a goto section that contains the goto model (goto program + symbol table)
-[goto-cc-variants](../goto-cc/variants/).
+Calling `goto-gcc` instead of simply compiling with `gcc` or `goto-cc` produces
+an ELF binary with a goto section that contains the goto model (goto program +
+symbol table) [goto-cc-variants](../goto-cc/variants/).
 
-The memory analyzer supports two workflows to initiate the GDB with user code:
-either to run the code from a core-file or up to a break-point. If the user
-already has a core-file, they can specify it with the option `--core-file cf`.
-If the user knows the point of their program from where they want to run the
-analysis, they can specify it with the option `--breakpoint bp`. Only one of
+The memory analyzer supports two ways of running GDB on the user code: either to
+run the code from a core-file or up to a break-point. If the user already has a
+core-file, they can specify it with the option `--core-file cf`. If the user
+knows the point of their program from where they want to run the analysis, they
+can specify it with the option `--breakpoint bp`. Only one of
 core-file/break-point option can be used.
 
 The tool also expects a comma-separated list of symbols to be analysed
-`--symbols s1, s2, ..`. Given its dependence on GDB, `memory-analyzer` is a
+`--symbols s1, s2, ...`. Given its dependence on GDB, `memory-analyzer` is a
 Unix-only tool. The tool calls `gdb` to obtain the snapshot which is why the
-`-g` option is necessary for the program symbols to be visible.
+`-g` option is necessary when compiling for the program symbols to be visible.
 
 Take for example the following program:
 
@@ -77,11 +73,11 @@ to obtain as output the human readable list of values for each requested symbol:
 ```
 
 The above option is useful for the user and their preliminary analysis but does
-not contain enough information for further computer-based analyses. To that end,
-memory analyzer has an option to request the result to be a snapshot of the
-whole symbol table `--symtab-snapshot`. Finally, to obtain an output in JSON
-format, e.g. for further analyses by `goto-harness` pass the additional option
-`--json-ui`.
+not contain enough information for further automated analyses. To that end,
+memory analyzer has an option for the snapshot to be represented in the format
+of a symbol table (with `--symtab-snapshot`). Finally, to obtain an output in
+JSON format, e.g., for further analyses by `goto-harness` the additional option
+`--json-ui` needs to be passed to `memory-analyzer`.
 
 ```sh
 $ memory-analyzer --symtab-snapshot --json-ui \