|
| 1 | +\ingroup module_hidden |
| 2 | + |
| 3 | +\page central-data-structures Central Data Structures |
| 4 | + |
| 5 | +# Central Data Structures |
| 6 | + |
| 7 | +The following is some light technical documentation of the major data structures |
| 8 | +used in the input transformation to Intermediate Representation (IR) inside |
| 9 | +CBMC and the assorted CProver tools. |
| 10 | + |
| 11 | +## GOTO models |
| 12 | + |
| 13 | +The `goto_modelt` class is the top-level data structure that CBMC (and the other |
| 14 | +tools) use for holding the GOTO intermediate representation. The following |
| 15 | +diagram is a simplified view of how the data structures relate to each other - |
| 16 | + |
| 17 | +```mermaid |
| 18 | +erDiagram |
| 19 | + goto_modelt { |
| 20 | + symbol_tablet symbol_table |
| 21 | + goto_functionst goto_functions |
| 22 | + } |
| 23 | + goto_modelt ||--|| symbol_tablet : "" |
| 24 | + goto_modelt ||--|| goto_functionst : "" |
| 25 | + symbol_tablet { |
| 26 | + symbol_map symbols |
| 27 | + } |
| 28 | + symbol_tablet ||--o{ symbolt : "" |
| 29 | + symbolt { |
| 30 | + string name |
| 31 | + } |
| 32 | + goto_functionst { |
| 33 | + function_map functions |
| 34 | + } |
| 35 | + goto_functionst ||--o{ goto_functiont : "" |
| 36 | + goto_functiont { |
| 37 | + goto_programt body |
| 38 | + ordered_collection_of parameter_identifiers |
| 39 | + } |
| 40 | + goto_functiont ||--|| goto_programt : "" |
| 41 | + goto_programt { |
| 42 | + ordered_collection_of instructions |
| 43 | + } |
| 44 | + goto_programt ||--o{ instructiont : "" |
| 45 | + instructiont { |
| 46 | + enumeration instruction_type |
| 47 | + goto_program_codet code |
| 48 | + boolean_expression guard |
| 49 | + source_locationt source_location |
| 50 | + } |
| 51 | + instructiont ||--|| goto_program_instruction_typet : "" |
| 52 | + instructiont ||--o| goto_program_codet : "" |
| 53 | + instructiont ||--o| source_locationt : "" |
| 54 | + goto_program_codet ||--o| source_locationt : "" |
| 55 | +``` |
| 56 | + |
| 57 | +A `goto_modelt` is effectively a pair, consisting of: |
| 58 | + |
| 59 | +* A collection of GOTO functions. |
| 60 | +* A symbol table containing symbol definitions which may be referred to by |
| 61 | + symbol expressions. Symbol expressions are found in the goto functions and the |
| 62 | + (sub-)expressions for the definitions of symbols. |
| 63 | + |
| 64 | +In pseudocode, the type looks this: |
| 65 | + |
| 66 | +```js |
| 67 | +type goto_modelt { |
| 68 | + type goto_functionst = map<identifier, goto_functiont> |
| 69 | + type symbol_tablet = map<identifier, symbolt> |
| 70 | +} |
| 71 | +``` |
| 72 | + |
| 73 | +There is an abstract interface for goto models provided by the |
| 74 | +`abstract_goto_modelt` class. This is defined and documented in the file |
| 75 | +[`src/goto-programs/abstract_goto_model.h`](../../src/goto-programs/abstract_goto_model.h). |
| 76 | +Ideally code which takes a goto model as input should accept any implementation |
| 77 | +of the abstract interface rather than accepting the concrete `goto_modelt` |
| 78 | +exclusively. This helps with compatibility with `jbmc` which uses lazy loading. |
| 79 | +See the `lazy_goto_modelt` class which is defined and documented in |
| 80 | +[`jbmc/src/java_bytecode/lazy_goto_model.h`](../../jbmc/src/java_bytecode/lazy_goto_model.h) |
| 81 | +for details of lazy loading. |
| 82 | + |
| 83 | +For further information about symbols see the `symbolt` class which is defined |
| 84 | +and documented in [`src/util/symbol.h`](../../src/util/symbol.h) and the |
| 85 | +`symbol_exprt` (symbol expression) class which is defined and documented in |
| 86 | +[`src/util/std_expr.h`](../../src/util/std_expr.h). |
| 87 | + |
| 88 | +## goto_functiont |
| 89 | + |
| 90 | +A `goto_functiont` is also defined as a pair. It's designed to represent a function |
| 91 | +at the goto level, and effectively it's the following data type (in pseudocode): |
| 92 | + |
| 93 | +```js |
| 94 | +type goto_functiont { |
| 95 | + goto_programt body |
| 96 | + list<identifiers> parameters |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +The `goto_programt` denoting the `body` of the function will be the subject of |
| 101 | +a more elaborate explanation in the next section. The in-memory structure of a |
| 102 | +goto function allows the `body` to be optional, but only functions with bodies |
| 103 | +are included in the serialised goto binaries. |
| 104 | + |
| 105 | +The `parameters` subcomponent is a list of identifiers that are to be looked-up |
| 106 | +in the symbol-table for their definitions. |
| 107 | + |
| 108 | +## goto_programt |
| 109 | + |
| 110 | +A goto program is a sequence of GOTO instructions (`goto_instructiont`). For |
| 111 | +details of the `goto_programt` class itself see the documentation in |
| 112 | +[`goto_program.h`](../../src/goto-programs/goto_program.h). For further details |
| 113 | +of the life cycle of goto programs see |
| 114 | +[`src/goto-programs/README.md`](https://github.com/diffblue/cbmc/blob/develop/src/goto-programs/README.md). |
| 115 | + |
| 116 | +An instruction (`goto_instructiont`) is a triple (an element with three subcomponents), |
| 117 | +describing a GOTO level instruction with the following 3 component subtypes, |
| 118 | +again in pseudocode: |
| 119 | + |
| 120 | +```js |
| 121 | +type goto_instructiont { |
| 122 | + instruction_type instr_type |
| 123 | + instruction statement |
| 124 | + guard boolean_expr |
| 125 | +} |
| 126 | +``` |
| 127 | + |
| 128 | +The pseudocode subtypes above require some more elaboration, so we will provide some extra |
| 129 | +detail to illuminate some details at the code level: |
| 130 | + |
| 131 | +* The `instruction_type` above corresponds to the `goto_program_instruction_typet` type |
| 132 | + listed in [`goto_program.h`](../../src/goto-programs/goto_program.h) |
| 133 | + * Some illustrative instruction types are `assign`, `function_call`, `return`, etc. |
| 134 | +* The `instruction` is a statement represented by a `goto_instruction_codet`. |
| 135 | + * A `goto_instruction_codet` is an alias of `codet` (a data structure broadly representing |
| 136 | + a statement inside CBMC) that contains the actual code to be executed. |
| 137 | + * You can distinguish different statements by using the result of the |
| 138 | + `get_statement` member function of the `codet` class. |
| 139 | +* The `guard` is an `exprt` (a data structure broadly representing an expression inside CBMC) |
| 140 | + that is expected to have type `bool`. |
| 141 | + * This is optional - not every instruction is expected to have a `guard` associated with it. |
| 142 | + |
| 143 | +## source_locationt |
| 144 | + |
| 145 | +Another important data structure that needs to be discussed at this point is |
| 146 | +`source_locationt`. |
| 147 | + |
| 148 | +`source_locationt` are attached into various `exprt`s (the data structure representing |
| 149 | +various expressions, usually the result of some early processing, e.g. the result of the |
| 150 | +frontend parsing a file). |
| 151 | + |
| 152 | +This means that `codet`s, representing GOTO instructions *also* have a `source_locationt` |
| 153 | +attached to them, by virtue of inheriting from `exprt`. |
| 154 | + |
| 155 | +For the most part, `source_locationt` is something that is only being used when we print |
| 156 | +various nodes (for property listing, counterexample/trace printing, etc). |
| 157 | + |
| 158 | +It might be possible that a specific source location might point to a CBMC instrumentation |
| 159 | +primitive (which might be reported as a `built-in-addition`) or there might even be no-source |
| 160 | +location (because it might be part of harnesses generated as an example, that have no presence |
| 161 | +in the user code). |
| 162 | + |
| 163 | +## irept |
| 164 | + |
| 165 | +`irep`s are the underlying data structure used to implement many of the data |
| 166 | +structures in CBMC and the assorted tools. These include the `exprt`, `typet`, |
| 167 | +`codet` and `source_locationt` classes. This is a tree data structure where |
| 168 | +each node is expected to contain a string/ID and may have child nodes stored in |
| 169 | +both a sequence of child nodes and a map of strings/IDs to child nodes. This |
| 170 | +enables the singular `irept` data structure to be used to model graphs such as |
| 171 | +ASTs, CFGs, etc. |
| 172 | + |
| 173 | +The classes extending `irept` define how the higher level concepts are mapped |
| 174 | +onto the underlying tree data structure. For this reason it is usually advised |
| 175 | +that the member functions of the sub-classes should be used to access the data |
| 176 | +held, rather than the member functions of the base `irept` class. This aids |
| 177 | +potential future restructuring and associates accesses of the data with the |
| 178 | +member functions which have the doxygen explaining what the data is. |
| 179 | + |
| 180 | +The strings/IDs held within `irept` are of type `irep_idt`. These can be |
| 181 | +converted to `std::string` using the `id2string` function. There is a mechanism |
| 182 | +provided for casting expressions and types in |
| 183 | +[`src/util/expr_cast.h`](../../src/util/expr_cast.h). In depth documentation |
| 184 | +of the `irept` class itself can be found in |
| 185 | +[`src/util/irep.h`](../../src/util/irep.h). |
0 commit comments