Skip to content

Commit 069e75e

Browse files
authored
Merge pull request #4854 from danpoe/doc/update-memory-snapshot-harness-documentation
Update memory snapshot harness documentation
2 parents abf7a6d + b179f78 commit 069e75e

File tree

2 files changed

+106
-76
lines changed

2 files changed

+106
-76
lines changed

doc/cprover-manual/goto-harness.md

Lines changed: 84 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -314,87 +314,121 @@ VERIFICATION SUCCESSFUL
314314
315315
### The memory snapshot harness
316316
317-
The `function-call` harness is used in situations in which we want the analysed
318-
function to work in arbitrary environment. If we want to analyse a function
319-
starting from a _real_ program state, we can call the `memory-snapshot` harness
320-
instead.
321-
322-
Furthermore, the program state of interest may be taken at a particular location
323-
within a function. In that case we do not want the harness to instrument the
324-
whole function but rather to allow starting the execution from a specific
325-
initial location (specified via `--initial-location func[:<n>]`). Note that the
326-
initial location does not have to be the first instruction of a function: we can
327-
also specify the _location number_ `n` to set the initial location inside our
328-
function. The _location numbers_ do not have to coincide with the lines of the
329-
program code. To find the _location number_ run CBMC with
330-
`--show-goto-functions`. Most commonly, the _location number_ is the instruction
331-
of the break-point used to extract the program state for the memory snapshot.
317+
The `function-call` harness is used in situations in which we want to analyze a
318+
function in an arbitrary environment. If we want to analyze a function starting
319+
from a _real_ program state, we can use the `memory-snapshot` harness instead.
320+
321+
The snapshot of the program state of interest may be taken at a particular
322+
program location within a function (using the `memory-analyzer` tool). In that
323+
case we want to generate a harness that behaves as if execution starts at a
324+
particular program location. The initial program location can be specified via
325+
the options `--initial-goto-location <function>[:<location-number>]` or
326+
`--initial-source-location <file>:<line-number>`.
332327
333328
Say we want to check the assertion in the following code:
334329
335330
```C
336331
// main.c
337332
#include <assert.h>
333+
#include <stdlib.h>
338334
339-
unsigned int x;
340-
unsigned int y;
341-
342-
unsigned int nondet_int() {
343-
unsigned int z;
344-
return z;
345-
}
335+
int x;
336+
int y;
337+
int z;
346338
347-
void checkpoint() {}
339+
// complex function which returns 1
340+
int get_one()
341+
{
342+
int i;
348343
349-
unsigned int complex_function_which_returns_one() {
350-
unsigned int i = 0;
351-
while(++i < 1000001) {
352-
if(nondet_int() && ((i & 1) == 1))
344+
for(i = 0; i < 100001; i++)
345+
{
346+
if(rand() && ((i & 1) == 1))
353347
break;
354348
}
349+
355350
return i & 1;
356351
}
357352
358-
void fill_array(unsigned int* arr, unsigned int size) {
359-
for (unsigned int i = 0; i < size; i++)
360-
arr[i]=nondet_int();
353+
// return a random value (!= 0)
354+
int get_random_value()
355+
{
356+
int r;
357+
while((r = rand()) == 0) {}
358+
return r;
361359
}
362360
363-
unsigned int array_sum(unsigned int* arr, unsigned int size) {
364-
unsigned int sum = 0;
365-
for (unsigned int i = 0; i < size; i++)
366-
sum += arr[i];
367-
return sum;
361+
int clip(int i)
362+
{
363+
if(i > 99)
364+
{
365+
i = 99;
366+
}
367+
368+
return i;
368369
}
369370
370-
const unsigned int array_size = 100000;
371+
int main()
372+
{
373+
x = get_random_value();
374+
y = get_one();
375+
376+
// snapshot taken here (line 46)
377+
378+
z = clip(x);
379+
380+
assert(y + z <= 100);
371381
372-
int main() {
373-
x = complex_function_which_returns_one();
374-
unsigned int large_array[array_size];
375-
fill_array(large_array, array_size);
376-
y = array_sum(large_array, array_size);
377-
checkpoint();
378-
assert(y + 2 > x);
379382
return 0;
380383
}
381384
```
382385

383-
But are not particularly interested in analysing the complex function, since we
384-
trust that its implementation is correct. Hence we run the above program
385-
stopping after the assignments to `x` and `x` and storing the program state,
386-
e.g. using the `memory-analyzer`, in a JSON file `snapshot.json`. Then run the
387-
harness and verify the assertion with:
386+
Assume we are interested in the code represented by the `clip()` function and
387+
its effect on the assertion below. To that end, we want to take a memory
388+
snapshot after the calls to `get_random_value()` and `get_one()`.
389+
390+
In order to take the snapshot with `memory-analyzer`, we need to first compile
391+
the program which `goto-gcc`, which produces a binary containing both native
392+
machine code and the corresponding goto program:
393+
394+
```sh
395+
$ goto-gcc -g -o main.gb main.c
396+
```
397+
398+
Then we can execute the program with `memory-analyzer` and take a snapshot at
399+
the specified breakpoint. The variables to be included in the snapshot need to
400+
be specified via the `--symbols` option.
388401

402+
```sh
403+
$ memory-analyzer \
404+
--breakpoint 46 \
405+
--symbols 'x, y, z' \
406+
--symtab-snapshot \
407+
--json-ui \
408+
main.gb \
409+
> snapshot.json
389410
```
411+
412+
We then generate a harness with `goto-harness` that behaves as if execution
413+
started from the state given by the memory snapshot at the specified program
414+
location. We further overapproximate the value returned by `get_random_value()`
415+
by havocking the variable `x`.
416+
417+
```sh
390418
$ goto-cc -o main.gb main.c
419+
391420
$ goto-harness \
392421
--harness-function-name harness \
393422
--harness-type initialise-with-memory-snapshot \
394423
--memory-snapshot snapshot.json \
395-
--initial-location checkpoint \
424+
--initial-source-location main.c:46 \
396425
--havoc-variables x \
397426
main.gb main-mod.gb
427+
```
428+
429+
We can now verify the resulting goto program with `cbmc`:
430+
431+
```sh
398432
$ cbmc --function harness main-mod.gb
399433
```
400434

@@ -405,7 +439,7 @@ This will result in:
405439
406440
** Results:
407441
main.c function main
408-
[main.assertion.1] line 42 assertion y + 2 > x: SUCCESS
442+
[main.assertion.1] line 50 assertion y + z <= 100: SUCCESS
409443
410444
** 0 of 1 failed (1 iterations)
411445
VERIFICATION SUCCESSFUL

doc/cprover-manual/memory-analyzer.md

Lines changed: 22 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,36 @@
22

33
## Memory Analyzer
44

5-
The memory analyzer is a front-end for running and querying GDB in order to
6-
obtain a state of the input program. The GDB is not needed to be executed
7-
directly but is rather used as a back-end for the memory analysis. A common
8-
application would be to obtain a snapshot of a program under analysis at a
9-
particular state of execution. Such a snapshot could be useful on its own: to
10-
query about values of particular variables. Furthermore, since that snapshot is
11-
a state of a valid concrete execution, it can also be used for subsequent
12-
analyses.
5+
The memory analyzer is a front-end that runs and queries GDB in order to obtain
6+
a snapshot of the state of the input program at a particular program location.
7+
Such a snapshot could be useful on its own: to check the values of variables at
8+
a particular program location. Furthermore, since the snapshot is a state of a
9+
valid concrete execution, it can also be used for subsequent analyses.
1310

1411
## Usage
1512

16-
We assume that the user wants to inspect a binary executable compiled with
17-
debugging symbols and a symbol table information understandable by CBMC, e.g.
18-
(having `goto-gcc` on the `PATH`):
13+
In order to analyze a program with `memory-analyzer` it needs to be compiled
14+
with `goto-gcc` (assuming `goto-gcc` is on the `PATH`):
1915

2016
```sh
2117
$ goto-gcc -g input_program.c -o input_program_exe
2218
```
2319

24-
Calling `goto-gcc` instead of simply compiling with `gcc` produces an ELF binary
25-
with a goto section that contains the goto model (goto program + symbol table)
26-
[goto-cc-variants](../goto-cc/variants/).
20+
Calling `goto-gcc` instead of simply compiling with `gcc` or `goto-cc` produces
21+
an ELF binary with a goto section that contains the goto model (goto program +
22+
symbol table) [goto-cc-variants](../goto-cc/variants/).
2723

28-
The memory analyzer supports two workflows to initiate the GDB with user code:
29-
either to run the code from a core-file or up to a break-point. If the user
30-
already has a core-file, they can specify it with the option `--core-file cf`.
31-
If the user knows the point of their program from where they want to run the
32-
analysis, they can specify it with the option `--breakpoint bp`. Only one of
24+
The memory analyzer supports two ways of running GDB on the user code: either to
25+
run the code from a core-file or up to a break-point. If the user already has a
26+
core-file, they can specify it with the option `--core-file cf`. If the user
27+
knows the point of their program from where they want to run the analysis, they
28+
can specify it with the option `--breakpoint bp`. Only one of
3329
core-file/break-point option can be used.
3430

3531
The tool also expects a comma-separated list of symbols to be analysed
36-
`--symbols s1, s2, ..`. Given its dependence on GDB, `memory-analyzer` is a
32+
`--symbols s1, s2, ...`. Given its dependence on GDB, `memory-analyzer` is a
3733
Unix-only tool. The tool calls `gdb` to obtain the snapshot which is why the
38-
`-g` option is necessary for the program symbols to be visible.
34+
`-g` option is necessary when compiling for the program symbols to be visible.
3935

4036
Take for example the following program:
4137

@@ -77,11 +73,11 @@ to obtain as output the human readable list of values for each requested symbol:
7773
```
7874

7975
The above option is useful for the user and their preliminary analysis but does
80-
not contain enough information for further computer-based analyses. To that end,
81-
memory analyzer has an option to request the result to be a snapshot of the
82-
whole symbol table `--symtab-snapshot`. Finally, to obtain an output in JSON
83-
format, e.g. for further analyses by `goto-harness` pass the additional option
84-
`--json-ui`.
76+
not contain enough information for further automated analyses. To that end,
77+
memory analyzer has an option for the snapshot to be represented in the format
78+
of a symbol table (with `--symtab-snapshot`). Finally, to obtain an output in
79+
JSON format, e.g., for further analyses by `goto-harness` the additional option
80+
`--json-ui` needs to be passed to `memory-analyzer`.
8581

8682
```sh
8783
$ memory-analyzer --symtab-snapshot --json-ui \

0 commit comments

Comments
 (0)