Skip to content

Commit 487c1a0

Browse files
committed
Add documentation for the central data structures in CBMC IR (goto-programs).
1 parent 2601e72 commit 487c1a0

File tree

2 files changed

+99
-1
lines changed

2 files changed

+99
-1
lines changed

doc/architectural/cbmc-architecture.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,14 @@ tools process this format, either directly from the front-end or from
5555
it’s saved output. These include a wide range of analysis and
5656
transformation tools (see \ref other-tools).
5757

58-
# Concepts #
58+
# Concepts
59+
60+
## Central data structures
61+
62+
For an explanation of the data structures involved in the modeling of a GOTO
63+
program (the GOTO Intermediate Representation used by CBMC and assorted tools)
64+
please have a look [here](central-data-structures.md).
65+
5966
## {C, java bytecode} → Parse tree → Symbol table → GOTO programs → GOTO program transformations → BMC → counterexample (goto_tracet) → printing ##
6067

6168
To be documented.
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Central Data Structures
2+
3+
The following is some light technical documentation of the major data structures
4+
represented in the input transformation to Intermediate Representation (IR) inside
5+
CBMC and the assorted CProver tools.
6+
7+
## goto_modelt
8+
9+
The `goto_modelt` is the main data structure that CBMC (and the other tools) use to
10+
represent GOTO-IR (the `GOTO` Intermediate Representation).
11+
12+
A `goto_modelt` is effectively a product type, consisting of:
13+
14+
* A list of GOTO functions (pseudocode: `type goto_functionst = list<goto_functiont>`)
15+
* A symbol table containing symbol references for the symbols contained in the
16+
GOTO functions (pseudocode: `type symbol_tablet = map<identifier, symbolt>`).
17+
18+
The abstract interface of `goto_modelt` is outlined in the file
19+
[`src/goto-programs/abstract_goto_model.h`](../../src/goto-programs/abstract_goto_model.h).
20+
21+
## goto_functiont
22+
23+
A `goto_functiont` is also a product type. It's designed to represent a function
24+
at the IR level, and effectively it's the following ADT (in pseudocode):
25+
26+
```js
27+
type goto_functiont {
28+
goto_programt body
29+
list<identifiers> parameters
30+
}
31+
```
32+
33+
Of the two types listed above, the `parameters` one should be self-documenting,
34+
(the values of these are later looked up in the symbol table), so we're going to
35+
focus next on the type `goto_programt`
36+
37+
## goto_programt
38+
39+
A `goto_programt` is a container of GOTO-IR instructions. In pseudocode, it would
40+
look like `type goto_programt = list<goto_instructiont>`.
41+
42+
An instruction (`goto_instructiont`) is another product type, describing a GOTO level
43+
instruction with the following 3 component subtypes, again in pseudocode:
44+
45+
```js
46+
type goto_instructiont {
47+
instruction_type instr_type
48+
instruction statement
49+
guard boolean_expr
50+
}
51+
```
52+
53+
The above subtypes are just illustrative, so we will provide some extra explanation for
54+
these:
55+
56+
* The `instruction_type` above corresponds to the `goto_program_instruction_typet` type
57+
listed in [`goto_program.h`](../../src/goto-programs/goto_program.h)
58+
* For illustration purposes, some instruction types are `assign`, `function_call`, `return`,
59+
etc.
60+
* The `instruction` is a statement represented by a `goto_instruction_codet`.
61+
* A `goto_instruction_codet` is a `codet` (a data structure broadly representing a statement
62+
inside CBMC) that contains the actual GOTO-IR instruction.
63+
* You can distinguish different statements by inspecting the `irep` element `ID_statement`.
64+
* The `guard` is an `exprt` (A CBMC data structure broadly representing an expression)
65+
that is expected to have type `bool`.
66+
* This is optional - not every instruction is expected to have a `guard` associated with it.
67+
68+
## source_locationt
69+
70+
Another important data structure that needs to be discussed at this point is
71+
`source_locationt`.
72+
73+
This is an `irept`. `irep`s are the central data structure that model most entities inside
74+
CBMC and the assorted tools - effectively a node/map like data structure that forms a hierachical
75+
tree that ends up modeling graphs like ASTs, CFGs, etc. (This will be further discussed in
76+
a dedicated page).
77+
78+
`source_locationt` are attached into various `exprt`s (the data structure representing
79+
various expressions, usually the result of some early processing, e.g. the result of the
80+
frontend parsing a file).
81+
82+
This means that `codet`s, representing GOTO-IR instructions *also* have a `source_locationt`
83+
attached to them, by virtue of inheriting from `exprt`.
84+
85+
For the most part, `source_locationt` is something that is only being used when we print
86+
various nodes (for property listing, counterexample/trace printing, etc).
87+
88+
It might be possible that a specific source location might point to a CBMC instrumentation
89+
primitive (which might be reported as a `built-in-addition`) or there might even be no-source
90+
location (because it might be part of harnesses generated as an example, that have no presence
91+
in the user code).

0 commit comments

Comments
 (0)