Skip to content

Commit 7c066f9

Browse files
author
owen-jones-diffblue
authored
Merge pull request #2618 from owen-jones-diffblue/doc/move-irep-docs-from-util-to-irep
move irept docs from util to irept and update them
2 parents b6258db + 0855872 commit 7c066f9

File tree

3 files changed

+93
-85
lines changed

3 files changed

+93
-85
lines changed

src/util/README.md

Lines changed: 8 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -10,90 +10,17 @@
1010
This section discusses some of the key data-structures used in the
1111
CPROVER codebase.
1212

13-
\subsection irept Irept Data Structure
13+
\subsection irept_section Irept Data Structure
1414

15-
There are a large number of kinds of tree structured or tree-like data in
16-
CPROVER. `irept` provides a single, unified representation for all of
17-
these, allowing structure sharing and reference counting of data. As
18-
such `irept` is the basic unit of data in CPROVER. Each `irept`
19-
contains[^1] a basic unit of data (of type `dt`) which contains four
20-
things:
15+
See \ref irept for more information.
2116

22-
* `data`: A string[^2], which is returned when the `id()` function is
23-
used.
17+
\subsection irep_idt_section Irep_idt and Dstringt
2418

25-
* `named_sub`: A map from `irep_namet` (a string) to `irept`. This
26-
is used for named children, i.e. subexpressions, parameters, etc.
27-
28-
* `comments`: Another map from `irep_namet` to `irept` which is used
29-
for annotations and other ‘non-semantic’ information
30-
31-
* `sub`: A vector of `irept` which is used to store ordered but
32-
unnamed children.
33-
34-
The `irept::pretty` function outputs the contents of an `irept` directly
35-
and can be used to understand and debug problems with `irept`s.
36-
37-
On their own `irept`s do not “mean” anything; they are effectively
38-
generic tree nodes. Their interpretation depends on the contents of
39-
result of the `id` function (the `data`) field. `util/irep_ids.txt`
40-
contains the complete list of `id` values. During the build process it
41-
is used to generate `util/irep_ids.h` which gives constants for each id
42-
(named `ID_`). These can then be used to identify what kind of data
43-
`irept` stores and thus what can be done with it.
44-
45-
To simplify this process, there are a variety of classes that inherit
46-
from `irept`, roughly corresponding to the ids listed (i.e. `ID_or`
47-
(the string `"or”`) corresponds to the class `or_exprt`). These give
48-
semantically relevant accessor functions for the data; effectively
49-
different APIs for the same underlying data structure. None of these
50-
classes add fields (only methods) and so static casting can be used. The
51-
inheritance graph of the subclasses of `irept` is a useful starting
52-
point for working out how to manipulate data.
53-
54-
There are three main groups of classes (or APIs); those derived from
55-
`typet`, `codet` and `exprt` respectively. Although all of these inherit
56-
from `irept`, these are the most abstract level that code should handle
57-
data. If code is manipulating plain `irept`s then something is wrong
58-
with the architecture of the code.
59-
60-
Many of the key descendent of `exprt` are declared in `std_expr.h`. All
61-
expressions have a named subfield / annotation which gives the type of
62-
the expression (slightly simplified from C/C++ as `unsignedbv_typet`,
63-
`signedbv_typet`, `floatbv_typet`, etc.). All type conversions are
64-
explicit with an expression with `id() == ID_typecast` and an ‘interface
65-
class’ named `typecast_exprt`. One key descendent of `exprt` is
66-
`symbol_exprt` which creates `irept` instances with the id of “symbol”.
67-
These are used to represent variables; the name of which can be found
68-
using the `get_identifier` accessor function.
69-
70-
`codet` inherits from `exprt` and is defined in `std_code.h`. It
71-
represents executable code; statements in C rather than expressions. In
72-
the front-end there are versions of these that hold whole code blocks,
73-
but in goto-programs these have been flattened so that each `irept`
74-
represents one sequence point (almost one line of code / one
75-
semi-colon). The most common descendents of `codet` are `code_assignt`
76-
so a common pattern is to cast the `codet` to an assignment and then
77-
recurse on the expression on either side.
78-
79-
[^1]: Or references, if reference counted data sharing is enabled. It is
80-
enabled by default; see the `SHARING` macro.
81-
82-
[^2]: Unless `USE_STD_STRING` is set, this is actually
83-
a `dstring` and thus an integer which is a reference into a string table
84-
85-
\subsection irep_idt Irep_idt and Dstringt
86-
87-
Inside `irept`s, strings are stored as `irep_idt`s, or `irep_namet`s for
88-
keys to `named_sub` or `comments`. If `USE_STD_STRING` is set then both
89-
`irep_idt` and `irep_namet` are `typedef`ed to `std::string`, but by default
90-
it is not set and they are `typedef`ed to `dstringt`. `dstringt` has one
91-
field, an unsigned integer which is an index into a static table of strings.
92-
This makes it expensive to create a new string (because you have to look
93-
through the whole table to see if it is already there, and add it if it
94-
isn't) but very cheap to compare strings (just compare the two integers). It
95-
also means that when you have lots of copies of the same string you only have
96-
to store the whole string once, which saves space.
19+
Inside \ref irept, strings are stored as irep_idts, or irep_namets for
20+
keys to named_sub or comments. By default both irep_idt and irep_namet
21+
are typedefed to \ref dstringt, unless USE_STD_STRING is set, in which
22+
case they are typedefed to std::string (this can be used for debugging
23+
purposes).
9724

9825
\dot
9926
digraph G {

src/util/dstring.h

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,20 @@ Author: Daniel Kroening, [email protected]
1616

1717
#include "string_container.h"
1818

19-
// Marked final to disable inheritance.
20-
// No virtual destructor, so runtime-polymorphic use would be unsafe.
19+
/// \ref dstringt has one field, an unsigned integer \ref no which is an index
20+
/// into a static table of strings. This makes it expensive to create a new
21+
/// string(because you have to look through the whole table to see if it is
22+
/// already there, and add it if it isn't) but very cheap to compare strings
23+
/// (just compare the two integers). It also means that when you have lots of
24+
/// copies of the same string you only have to store the whole string once,
25+
/// which saves space.
26+
///
27+
/// `irep_idt` and `irep_namet` are typedef-ed to \ref dstringt in irep.h unless
28+
/// `USE_STD_STRING` is set.
29+
///
30+
///
31+
/// Note: Marked final to disable inheritance. No virtual destructor, so
32+
/// runtime-polymorphic use would be unsafe.
2133
class dstringt final
2234
{
2335
public:

src/util/irep.h

Lines changed: 71 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,77 @@ inline const std::string &name2string(const irep_namet &n)
8282
class irept;
8383
const irept &get_nil_irep();
8484

85-
/*! \brief Base class for tree-like data structures with sharing
86-
*/
85+
/// \brief Base class for tree-like data structures with sharing
86+
///
87+
/// There are a large number of kinds of tree structured or tree-like data in
88+
/// CPROVER. \ref irept provides a single, unified representation for all of
89+
/// these, allowing structure sharing and reference counting of data. As
90+
/// such \ref irept is the basic unit of data in CPROVER. Each \ref irept
91+
/// contains (or references, if reference counted data sharing is enabled, as
92+
/// it is by default - see the `SHARING` macro) a basic unit of data (of type
93+
/// \ref dt) which contains four things:
94+
///
95+
/// * \ref irept::dt::data : A string, which is returned when the \ref id()
96+
/// function is used. (Unless `USE_STD_STRING` is set, this is actually a
97+
/// \ref dstringt and thus an integer which is a reference into a string
98+
/// table.)
99+
///
100+
/// * \ref irept::dt::named_sub : A map from `irep_namet` (a string) to \ref
101+
/// irept. This is used for named children, i.e. subexpressions, parameters,
102+
/// etc.
103+
///
104+
/// * \ref irept::dt::comments : Another map from `irep_namet` to \ref irept
105+
/// which is used for annotations and other ‘non-semantic’ information. Note
106+
/// that this map is ignored by the default \ref operator==.
107+
///
108+
/// * \ref irept::dt::sub : A vector of \ref irept which is used to store
109+
/// ordered but unnamed children.
110+
///
111+
/// The \ref irept::pretty function outputs the explicit tree structure of
112+
/// an \ref irept and can be used to understand and debug problems with
113+
/// `irept`s.
114+
///
115+
/// On their own `irept`s do not "mean" anything; they are effectively
116+
/// generic tree nodes. Their interpretation depends on the contents of
117+
/// result of the \ref id() function, i.e. the `data` field. `util/irep_ids.def`
118+
/// contains a list of `id` values. During the build process it is used
119+
/// to generate `util/irep_ids.h` which gives constants for each id
120+
/// (named `ID_`). You can also make `irep_idt`s which do not come from
121+
/// `util/irep_ids.def`. An `irep_idt` can then be used to identify what
122+
/// kind of data the \ref irept stores and thus what can be done with it.
123+
///
124+
/// To simplify this process, there are a variety of classes that inherit
125+
/// from \ref irept, roughly corresponding to the ids listed (i.e. `ID_or`
126+
/// (the string "or”) corresponds to the class \ref or_exprt). These give
127+
/// semantically relevant accessor functions for the data; effectively
128+
/// different APIs for the same underlying data structure. None of these
129+
/// classes add fields (only methods) and so static casting can be used. The
130+
/// inheritance graph of the subclasses of \ref irept is a useful starting
131+
/// point for working out how to manipulate data.
132+
///
133+
/// There are three main groups of classes (or APIs); those derived from
134+
/// \ref typet, \ref codet and \ref exprt respectively. Although all of these
135+
/// inherit from \ref irept, these are the most abstract level that code should
136+
/// handle data. If code is manipulating plain `irept`s then something is wrong
137+
/// with the architecture of the code.
138+
///
139+
/// Many of the key descendants of \ref exprt are declared in \ref std_expr.h.
140+
/// All expressions have a named subexpression with ID "type", which gives the
141+
/// type of the expression (slightly simplified from C/C++ as \ref
142+
/// unsignedbv_typet, \ref signedbv_typet, \ref floatbv_typet, etc.). All type
143+
/// conversions are explicit with a \ref typecast_exprt. One key descendant of
144+
/// \ref exprt is \ref symbol_exprt which creates \ref irept instances with ID
145+
/// “symbol”. These are used to represent variables; the name of which can be
146+
/// found using the `get_identifier` accessor function.
147+
///
148+
/// \ref codet inherits from \ref exprt and is defined in `std_code.h`. It
149+
/// represents executable code; statements in a C-like language rather than
150+
/// expressions. In the front-end there are versions of these that hold
151+
/// whole code blocks, but in goto-programs these have been flattened so
152+
/// that each \ref irept represents one sequence point (almost one line of
153+
/// code / one semi-colon). The most common descendant of \ref codet is
154+
/// \ref code_assignt so a common pattern is to cast the \ref codet to an
155+
/// assignment and then recurse on the expression on either side.
87156
class irept
88157
{
89158
public:

0 commit comments

Comments
 (0)