|
10 | 10 | This section discusses some of the key data-structures used in the
|
11 | 11 | CPROVER codebase.
|
12 | 12 |
|
13 |
| -\subsection irept Irept Data Structure |
| 13 | +\subsection irept_section Irept Data Structure |
14 | 14 |
|
15 |
| -There are a large number of kinds of tree structured or tree-like data in |
16 |
| -CPROVER. `irept` provides a single, unified representation for all of |
17 |
| -these, allowing structure sharing and reference counting of data. As |
18 |
| -such `irept` is the basic unit of data in CPROVER. Each `irept` |
19 |
| -contains[^1] a basic unit of data (of type `dt`) which contains four |
20 |
| -things: |
| 15 | +See \ref irept for more information. |
21 | 16 |
|
22 |
| -* `data`: A string[^2], which is returned when the `id()` function is |
23 |
| - used. |
| 17 | +\subsection irep_idt_section Irep_idt and Dstringt |
24 | 18 |
|
25 |
| -* `named_sub`: A map from `irep_namet` (a string) to `irept`. This |
26 |
| - is used for named children, i.e. subexpressions, parameters, etc. |
27 |
| - |
28 |
| -* `comments`: Another map from `irep_namet` to `irept` which is used |
29 |
| - for annotations and other ‘non-semantic’ information |
30 |
| - |
31 |
| -* `sub`: A vector of `irept` which is used to store ordered but |
32 |
| - unnamed children. |
33 |
| - |
34 |
| -The `irept::pretty` function outputs the contents of an `irept` directly |
35 |
| -and can be used to understand and debug problems with `irept`s. |
36 |
| - |
37 |
| -On their own `irept`s do not “mean” anything; they are effectively |
38 |
| -generic tree nodes. Their interpretation depends on the contents of |
39 |
| -result of the `id` function (the `data`) field. `util/irep_ids.txt` |
40 |
| -contains the complete list of `id` values. During the build process it |
41 |
| -is used to generate `util/irep_ids.h` which gives constants for each id |
42 |
| -(named `ID_`). These can then be used to identify what kind of data |
43 |
| -`irept` stores and thus what can be done with it. |
44 |
| - |
45 |
| -To simplify this process, there are a variety of classes that inherit |
46 |
| -from `irept`, roughly corresponding to the ids listed (i.e. `ID_or` |
47 |
| -(the string `"or”`) corresponds to the class `or_exprt`). These give |
48 |
| -semantically relevant accessor functions for the data; effectively |
49 |
| -different APIs for the same underlying data structure. None of these |
50 |
| -classes add fields (only methods) and so static casting can be used. The |
51 |
| -inheritance graph of the subclasses of `irept` is a useful starting |
52 |
| -point for working out how to manipulate data. |
53 |
| - |
54 |
| -There are three main groups of classes (or APIs); those derived from |
55 |
| -`typet`, `codet` and `exprt` respectively. Although all of these inherit |
56 |
| -from `irept`, these are the most abstract level that code should handle |
57 |
| -data. If code is manipulating plain `irept`s then something is wrong |
58 |
| -with the architecture of the code. |
59 |
| - |
60 |
| -Many of the key descendent of `exprt` are declared in `std_expr.h`. All |
61 |
| -expressions have a named subfield / annotation which gives the type of |
62 |
| -the expression (slightly simplified from C/C++ as `unsignedbv_typet`, |
63 |
| -`signedbv_typet`, `floatbv_typet`, etc.). All type conversions are |
64 |
| -explicit with an expression with `id() == ID_typecast` and an ‘interface |
65 |
| -class’ named `typecast_exprt`. One key descendent of `exprt` is |
66 |
| -`symbol_exprt` which creates `irept` instances with the id of “symbol”. |
67 |
| -These are used to represent variables; the name of which can be found |
68 |
| -using the `get_identifier` accessor function. |
69 |
| - |
70 |
| -`codet` inherits from `exprt` and is defined in `std_code.h`. It |
71 |
| -represents executable code; statements in C rather than expressions. In |
72 |
| -the front-end there are versions of these that hold whole code blocks, |
73 |
| -but in goto-programs these have been flattened so that each `irept` |
74 |
| -represents one sequence point (almost one line of code / one |
75 |
| -semi-colon). The most common descendents of `codet` are `code_assignt` |
76 |
| -so a common pattern is to cast the `codet` to an assignment and then |
77 |
| -recurse on the expression on either side. |
78 |
| - |
79 |
| -[^1]: Or references, if reference counted data sharing is enabled. It is |
80 |
| - enabled by default; see the `SHARING` macro. |
81 |
| - |
82 |
| -[^2]: Unless `USE_STD_STRING` is set, this is actually |
83 |
| -a `dstring` and thus an integer which is a reference into a string table |
84 |
| - |
85 |
| -\subsection irep_idt Irep_idt and Dstringt |
86 |
| - |
87 |
| -Inside `irept`s, strings are stored as `irep_idt`s, or `irep_namet`s for |
88 |
| -keys to `named_sub` or `comments`. If `USE_STD_STRING` is set then both |
89 |
| -`irep_idt` and `irep_namet` are `typedef`ed to `std::string`, but by default |
90 |
| -it is not set and they are `typedef`ed to `dstringt`. `dstringt` has one |
91 |
| -field, an unsigned integer which is an index into a static table of strings. |
92 |
| -This makes it expensive to create a new string (because you have to look |
93 |
| -through the whole table to see if it is already there, and add it if it |
94 |
| -isn't) but very cheap to compare strings (just compare the two integers). It |
95 |
| -also means that when you have lots of copies of the same string you only have |
96 |
| -to store the whole string once, which saves space. |
| 19 | +Inside \ref irept, strings are stored as irep_idts, or irep_namets for |
| 20 | +keys to named_sub or comments. By default both irep_idt and irep_namet |
| 21 | +are typedefed to \ref dstringt, unless USE_STD_STRING is set, in which |
| 22 | +case they are typedefed to std::string (this can be used for debugging |
| 23 | +purposes). |
97 | 24 |
|
98 | 25 | \dot
|
99 | 26 | digraph G {
|
|
0 commit comments