Skip to content

Commit ef5ff03

Browse files
committed
Add a little more detail and discussion on irept
1 parent dbd6988 commit ef5ff03

File tree

1 file changed

+55
-8
lines changed

1 file changed

+55
-8
lines changed

src/util/README.md

Lines changed: 55 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,25 +3,72 @@
33

44
# Folder util
55

6-
\author Martin Brain, Owen Jones
6+
\author Martin Brain, Owen Jones, Chris Smowton
77

88
\section data_structures Data Structures
99

1010
\ref util contains some of the key data-structures used in the
1111
CPROVER codebase.
1212

13-
\subsection irept_section irept: a 4-tuple (data, named-sub, comments, sub)
13+
\subsection irept_section irept: a general-purpose polymorphic tree
1414

15-
See documentation at \ref irept.
15+
See detailed documentation at \ref irept.
1616

17-
As that documentation says, [irept](\ref irept)s are generic tree nodes. You
18-
should think of them as having a single string ([data](irept::data), actually
19-
an \ref irep_idt) and lots of child nodes, some of which are numbered
17+
[irept](\ref irept)s are generic tree nodes. You
18+
should think of each node as holding a single string ([data](irept::data),
19+
actually an \ref irep_idt) and lots of child nodes, some of which are numbered
2020
([sub](\ref irept::dt::sub)) and some of which are labelled, and the label
2121
can either start with a “\#” ([comments](\ref irept::dt::comments)) or without
2222
one ([named_sub](\ref irept::dt::named_sub)). The meaning of the “\#” is that
23-
this child should not be considered important, for example it shouldn't be
24-
counted when comparing two [irept](\ref irept)s for equality.
23+
this child shouldn't be counted when comparing two [irept](\ref irept)s for
24+
equality; this is usually used when making an advisory annotation which does
25+
not alter the semantics of the program.
26+
27+
They are used to represent many kinds of structured objects throughout the
28+
CPROVER codebase, such as expressions, types and code. An \ref exprt represents
29+
expressions, including for example \ref equal_exprt, an equality predicate, or
30+
\ref dereference_exprt, which represents the `*` unary dereference operator
31+
found in C. A \ref typet represents a type, and may have other \ref typet nodes
32+
as children: for example, \ref array_typet represents an array, and has a single
33+
numbered child that gives the type of the array elements. Finally, \ref codet
34+
represents imperative statements, such as \ref code_assignt, which represents an
35+
imperative assignment. It has two numbered operands, its left- and right-hand
36+
sides.
37+
38+
Note that \ref exprt, \ref codet and similar classes deriving from \ref irept
39+
do so in order to add constraints to the general tree (for example, providing
40+
accessor methods with user-friendly names, or enforcing invariants that a node
41+
must have a certain number of children), but do not override virtual methods or
42+
add fields.
43+
44+
The implementation of \ref irept allows saving memory by sharing instances of
45+
its internal storage using a
46+
[copy-on-write](https://en.wikipedia.org/wiki/Copy-on-write) scheme. For
47+
example, the statement `irep1.sub()[0] = irep2;` will share rather than copy
48+
`irep2` and its children, saving memory unless either irept is subsequently
49+
modified, at which point a copy is made.
50+
51+
\subsection irept_discussion_section Discussion
52+
53+
Many other compiler infrastructures represent a polymorphic tree using nodes
54+
specialised to a particular expression type: for example, perhaps a binary
55+
addition operator could be represented using a tag plus two pointers to child
56+
expressions, rather than allocating a whole irept (including a variable-length
57+
expression vector and empty maps for storing named subexpressions). This may
58+
save memory, but inhibits ad-hoc annotations such as tagging the addition
59+
"does not overflow" without adding that field to addition operations globally
60+
or maintaining a shadow data structure to track that information. This is easier
61+
with a general irept structure that can store an arbitrary number of
62+
arbitrarily-named child nodes.
63+
64+
Another optimisation commonly seen when storing polymorphic trees is to use a
65+
uniform node data structure (like irept) but to keep the node compact, for
66+
example storing at most four pointers and transparently allocating extension
67+
nodes when necessary for an unusually large expression. This provides the best
68+
of both worlds, obtaining compact storage for common cases such as unannotated
69+
binary expressions while permitting deviation at times. The downside is that
70+
implementing such a system is more complex than straightforwardly using C++
71+
standard data structures as in irept.
2572

2673
\subsection irep_idt_section Strings: dstringt, the string_container and the ID_*
2774

0 commit comments

Comments
 (0)