Skip to content

Commit 04f7860

Browse files
committed
Add a little more detail and discussion on irept
1 parent dbd6988 commit 04f7860

File tree

1 file changed

+45
-5
lines changed

1 file changed

+45
-5
lines changed

src/util/README.md

Lines changed: 45 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
# Folder util
55

6-
\author Martin Brain, Owen Jones
6+
\author Martin Brain, Owen Jones, Chris Smowton
77

88
\section data_structures Data Structures
99

@@ -12,17 +12,57 @@ CPROVER codebase.
1212

1313
\subsection irept_section irept: a 4-tuple (data, named-sub, comments, sub)
1414

15-
See documentation at \ref irept.
15+
See detailed documentation at \ref irept.
1616

17-
As that documentation says, [irept](\ref irept)s are generic tree nodes. You
18-
should think of them as having a single string ([data](irept::data), actually
19-
an \ref irep_idt) and lots of child nodes, some of which are numbered
17+
[irept](\ref irept)s are generic tree nodes. You
18+
should think of each node as holding a single string ([data](irept::data),
19+
actually an \ref irep_idt) and lots of child nodes, some of which are numbered
2020
([sub](\ref irept::dt::sub)) and some of which are labelled, and the label
2121
can either start with a “\#” ([comments](\ref irept::dt::comments)) or without
2222
one ([named_sub](\ref irept::dt::named_sub)). The meaning of the “\#” is that
2323
this child should not be considered important, for example it shouldn't be
2424
counted when comparing two [irept](\ref irept)s for equality.
2525

26+
They are used to represent all manner of structured objects throughout the
27+
CPROVER codebase, such as expressions, types and code. An \ref exprt represents
28+
expressions, including for example \ref equal_exprt, an equality predicate, or
29+
\ref dereference_exprt, which represents the `*` unary dereference operator
30+
found in C. A \ref typet represents a type, and may have other \ref typet nodes
31+
as children: for example, \ref array_typet represents an array, and has a single
32+
numbered child that gives the type of the array elements. Finally, \ref codet
33+
represents imperative code, such as \ref code_assignt, which represents an
34+
imperative assignment. It has two numbered operands, its left- and right-hand
35+
sides.
36+
37+
The implementation of \ref irept allows saving memory by sharing instances of
38+
its internal storage using a
39+
[copy-on-write](https://en.wikipedia.org/wiki/Copy-on-write) scheme. For
40+
example, the statement `irep1.sub()[0] = irep2;` will share rather than copy
41+
`irep2` and its children, saving memory unless either irept is subsequently
42+
modified, at which point a copy is made.
43+
44+
\subsection irept_discussion_section Discussion
45+
46+
Many other compiler infrastructures represent a polymorphic tree using nodes
47+
specialised to a particular expression type: for example, perhaps a binary
48+
addition operator could be represented using a tag plus two pointers to child
49+
expressions, rather than allocating a whole irept (including a variable-length
50+
expression vector and empty maps for storing named subexpressions). This may
51+
save memory, but inhibits ad-hoc annotations such as tagging the addition
52+
"does not overflow" without adding that field to addition operations globally
53+
or maintaining a shadow data structure to track that information. This is easier
54+
with a general irept structure that can store an arbitrary number of
55+
arbitrarily-named child nodes.
56+
57+
Another optimisation commonly seen when storing polymorphic trees is to use a
58+
uniform node data structure (like irept) but to keep the node compact, for
59+
example storing at most four pointers and transparently allocating extension
60+
nodes when necessary for an unusually large expression. This provides the best
61+
of both worlds, obtaining compact storage for common cases such as unannotated
62+
binary expressions while permitting deviation at times. The downside is that
63+
implementing such a system is more complex than straightforwardly using C++
64+
standard data structures as in irept.
65+
2666
\subsection irep_idt_section Strings: dstringt, the string_container and the ID_*
2767

2868
Within cbmc, strings are represented using \ref irep_idt, or \ref irep_namet

0 commit comments

Comments
 (0)