|
3 | 3 |
|
4 | 4 | # Folder util
|
5 | 5 |
|
6 |
| -\author Martin Brain, Owen Jones |
| 6 | +\author Martin Brain, Owen Jones, Chris Smowton |
7 | 7 |
|
8 | 8 | \section data_structures Data Structures
|
9 | 9 |
|
10 | 10 | \ref util contains some of the key data-structures used in the
|
11 | 11 | CPROVER codebase.
|
12 | 12 |
|
13 |
| -\subsection irept_section irept: a 4-tuple (data, named-sub, comments, sub) |
| 13 | +\subsection irept_section irept: a general-purpose polymorphic tree |
14 | 14 |
|
15 |
| -See documentation at \ref irept. |
| 15 | +See detailed documentation at \ref irept. |
16 | 16 |
|
17 |
| -As that documentation says, [irept](\ref irept)s are generic tree nodes. You |
18 |
| -should think of them as having a single string ([data](irept::data), actually |
19 |
| -an \ref irep_idt) and lots of child nodes, some of which are numbered |
| 17 | +[irept](\ref irept)s are generic tree nodes. You |
| 18 | +should think of each node as holding a single string ([data](irept::data), |
| 19 | +actually an \ref irep_idt) and lots of child nodes, some of which are numbered |
20 | 20 | ([sub](\ref irept::dt::sub)) and some of which are labelled, and the label
|
21 | 21 | can either start with a “\#” ([comments](\ref irept::dt::comments)) or without
|
22 | 22 | one ([named_sub](\ref irept::dt::named_sub)). The meaning of the “\#” is that
|
23 |
| -this child should not be considered important, for example it shouldn't be |
24 |
| -counted when comparing two [irept](\ref irept)s for equality. |
| 23 | +this child shouldn't be counted when comparing two [irept](\ref irept)s for |
| 24 | +equality; this is usually used when making an advisory annotation which does |
| 25 | +not alter the semantics of the program. |
| 26 | + |
| 27 | +They are used to represent many kinds of structured objects throughout the |
| 28 | +CPROVER codebase, such as expressions, types and code. An \ref exprt represents |
| 29 | +expressions, including for example \ref equal_exprt, an equality predicate, or |
| 30 | +\ref dereference_exprt, which represents the `*` unary dereference operator |
| 31 | +found in C. A \ref typet represents a type, and may have other \ref typet nodes |
| 32 | +as children: for example, \ref array_typet represents an array, and has a single |
| 33 | +numbered child that gives the type of the array elements. Finally, \ref codet |
| 34 | +represents imperative statements, such as \ref code_assignt, which represents an |
| 35 | +imperative assignment. It has two numbered operands, its left- and right-hand |
| 36 | +sides. |
| 37 | + |
| 38 | +Note that \ref exprt, \ref codet and similar classes deriving from \ref irept |
| 39 | +do so in order to add constraints to the general tree (for example, providing |
| 40 | +accessor methods with user-friendly names, or enforcing invariants that a node |
| 41 | +must have a certain number of children), but do not override virtual methods or |
| 42 | +add fields. |
| 43 | + |
| 44 | +The implementation of \ref irept allows saving memory by sharing instances of |
| 45 | +its internal storage using a |
| 46 | +[copy-on-write](https://en.wikipedia.org/wiki/Copy-on-write) scheme. For |
| 47 | +example, the statement `irep1.sub()[0] = irep2;` will share rather than copy |
| 48 | +`irep2` and its children, saving memory unless either irept is subsequently |
| 49 | +modified, at which point a copy is made. |
| 50 | + |
| 51 | +\subsection irept_discussion_section Discussion |
| 52 | + |
| 53 | +Many other compiler infrastructures represent a polymorphic tree using nodes |
| 54 | +specialised to a particular expression type: for example, perhaps a binary |
| 55 | +addition operator could be represented using a tag plus two pointers to child |
| 56 | +expressions, rather than allocating a whole irept (including a variable-length |
| 57 | +expression vector and empty maps for storing named subexpressions). This may |
| 58 | +save memory, but inhibits ad-hoc annotations such as tagging the addition |
| 59 | +"does not overflow" without adding that field to addition operations globally |
| 60 | +or maintaining a shadow data structure to track that information. This is easier |
| 61 | +with a general irept structure that can store an arbitrary number of |
| 62 | +arbitrarily-named child nodes. |
| 63 | + |
| 64 | +Another optimisation commonly seen when storing polymorphic trees is to use a |
| 65 | +uniform node data structure (like irept) but to keep the node compact, for |
| 66 | +example storing at most four pointers and transparently allocating extension |
| 67 | +nodes when necessary for an unusually large expression. This provides the best |
| 68 | +of both worlds, obtaining compact storage for common cases such as unannotated |
| 69 | +binary expressions while permitting deviation at times. The downside is that |
| 70 | +implementing such a system is more complex than straightforwardly using C++ |
| 71 | +standard data structures as in irept. |
25 | 72 |
|
26 | 73 | \subsection irep_idt_section Strings: dstringt, the string_container and the ID_*
|
27 | 74 |
|
|
0 commit comments