Skip to content

Commit 35e278e

Browse files
author
Owen Jones
committed
Move docs from guide to module-level docs
Also added a lot of \refs and merged in the small amount of documentation that was already there.
1 parent 147c5d9 commit 35e278e

File tree

2 files changed

+104
-100
lines changed

2 files changed

+104
-100
lines changed

doc/architectural/data-structures-core-structures-and-ast.md

Lines changed: 2 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -3,97 +3,8 @@
33

44
\author Martin Brain, Peter Schrammel, Owen Jones
55

6-
## Strings: dstringt, the string_container and the ID_* ##
7-
8-
Within cbmc, strings are represented using `irep_idt`. By default this is
9-
typedefed to \ref dstringt, which stores a string as an index into a large
10-
static table of strings. This makes it easy to compare if two `irep_idt`s
11-
are equal (just compare the index) and it makes it efficient to store
12-
many copies of the same string. The static list of strings is initially populated
13-
from `irep_ids.def`, so for example the fourth entry in `irep_ids.def` is
14-
“IREP_ID_ONE(type)”, so the string “type” has index 3. You can refer to
15-
this `irep_idt` as `ID_type`. The other kind of line you see is
16-
“IREP_ID_TWO(C_source_location, #source_location)”, which means the
17-
`irep_idt` for the string “#source_location” can be referred to as
18-
`ID_C_source_location`. The “C” is for comment, which will be explained
19-
in the next section. Any strings that need to be stored as `irep_id`s which
20-
aren't in `irep_ids.def` are added to the end of the table when they are
21-
first encountered, and the same index is used for all instances.
22-
23-
See documentation at \ref dstringt.
24-
25-
## irept: a 4-triple (data, named-sub, comments, sub) ##
26-
27-
See documentation at \ref irept.
28-
29-
As that documentation says, `irept`s are generic tree nodes. You should
30-
think of them as having a single string (data, actually an `irep_idt`) and
31-
lots of child nodes, some of which are numbered (sub) and some of which are
32-
labelled, and the label can either start with a “\#” (comments-sub) or without
33-
one (named-sub). The meaning of the “\#” is that this child should not be
34-
considered important, for example it shouldn't be counted when comparing two
35-
`irept`s for equality.
36-
37-
## typet ##
38-
39-
See \ref typet.
40-
41-
To be documented.
42-
43-
### symbol_typet ###
44-
45-
See \ref symbol_typet.
46-
47-
To be documented.
48-
49-
## exprt ##
50-
51-
\ref exprt is the class to represent an expression. It inherits from \ref irept,
52-
and the only things it adds to it are that every \ref exprt has a named sub
53-
containing its type and everything in the sub of an \ref exprt is again an
54-
\ref exprt, not just an \ref irept. You can think of \ref exprt as a node in the
55-
abstract syntax tree for an expression. There are many classes that
56-
inherit from \ref exprt and which represent more specific things. For
57-
example, there is \ref minus_exprt, which has a sub of size 2 (for the two
58-
argument of minus).
59-
60-
Recall that every \ref irept has one piece of data of its own, i.e. its
61-
`id()`, and all other information is in its `named_sub`, `comments` or
62-
`sub`. For `exprt`s, the `id()` is used to specify why kind of \ref exprt this is,
63-
so a \ref minus_exprt has `ID_minus` as its `id()`. This means that a
64-
\ref minus_exprt can be passed wherever an \ref exprt is expected, and if you
65-
want to check if the expression you are looking at is a minus
66-
expression then you have to check its `id()` (or use
67-
`can_cast_expr<minus_exprt>`).
68-
69-
## codet ##
70-
71-
\ref exprt represents expressions and \ref codet represents statements. \ref codet
72-
inherits from \ref exprt, so all `codet`s are `exprt`s, with `id()` `ID_code`.
73-
Many different kinds of statements inherit from \ref codet, and they are
74-
distinguished by their `statement()`. For example, a while loop would be
75-
represented by a \ref code_whilet, which has `statement()` `ID_while`.
76-
\ref code_whilet has two operands in its sub, and helper functions to make
77-
it easier to use: `cond()` returns the first subexpression, which
78-
is the condition for the while loop, and `body()` returns the second
79-
subexpression, which is the body of the while loop - this has to be
80-
a \ref codet, because the body of a while loop is a statement.
81-
82-
## symbolt, symbol_table, and namespacet ##
83-
84-
To be documented.
85-
86-
### Symbol lifetimes, symbol modes, name, base-name, pretty-name; semantic of lifetimes for symex? ###
87-
88-
To be documented.
89-
90-
### Storing symbols and hiding symbols (namespacet) ###
91-
92-
To be documented.
93-
94-
### ns.follow ##
95-
96-
To be documented.
6+
The core structures used for representing abstract syntax trees are all
7+
documented in \ref util.
978

989
## Examples: how to represent the AST of statements, in C and in java ##
9910

src/util/README.md

Lines changed: 102 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,113 @@
77

88
\section data_structures Data Structures
99

10-
This section discusses some of the key data-structures used in the
10+
\ref util contains some of the key data-structures used in the
1111
CPROVER codebase.
1212

13-
\subsection irept_section Irept Data Structure
13+
\subsection irept_section irept: a 4-tuple (data, named-sub, comments, sub)
1414

15-
See \ref irept for more information.
15+
See documentation at \ref irept.
1616

17-
\subsection irep_idt_section Irep_idt and Dstringt
17+
As that documentation says, [irept](\ref irept)s are generic tree nodes. You
18+
should think of them as having a single string ([data](irept::data), actually
19+
an \ref irep_idt) and lots of child nodes, some of which are numbered
20+
([sub](\ref irept::dt::sub)) and some of which are labelled, and the label
21+
can either start with a “\#” ([comments](\ref irept::dt::comments)) or without
22+
one ([named_sub](\ref irept::dt::named_sub)). The meaning of the “\#” is that
23+
this child should not be considered important, for example it shouldn't be
24+
counted when comparing two [irept](\ref irept)s for equality.
1825

19-
Inside \ref irept, strings are stored as irep_idts, or irep_namets for
20-
keys to named_sub or comments. By default both irep_idt and irep_namet
21-
are typedefed to \ref dstringt, unless USE_STD_STRING is set, in which
22-
case they are typedefed to std::string (this can be used for debugging
23-
purposes).
26+
\subsection irep_idt_section Strings: dstringt, the string_container and the ID_*
27+
28+
Within cbmc, strings are represented using \ref irep_idt, or \ref irep_namet
29+
for keys to [named_sub](\ref irept::dt::named_sub) or
30+
[comments](\ref irept::dt::comments). By default these are both
31+
typedefed to \ref dstringt. For debugging purposes you can set `USE_STD_STRING`,
32+
in which case they are both typedefed to `std::string`.
33+
34+
\ref dstringt stores a string as an index into a large
35+
static table of strings. This makes it easy to compare if two
36+
[irep_idt](\ref irep_idt)s are equal (just compare the index) and it makes it
37+
efficient to store many copies of the same string. The static list of strings
38+
is initially populated from `irep_ids.def`, so for example the fourth entry
39+
in `irep_ids.def` is `“IREP_ID_ONE(type)”`, so the string “type” has index 3.
40+
You can refer to this \ref irep_idt as `ID_type`. The other kind of line you
41+
see is `“IREP_ID_TWO(C_source_location, #source_location)”`, which means the
42+
\ref irep_idt for the string “#source_location” can be referred to as
43+
`ID_C_source_location`. The “C” is for comment, meaning that it should be
44+
stored in the ([comments](\ref irept::dt::comments). Any strings that need
45+
to be stored as [irep_idt](\ref irep_idt)s which aren't in `irep_ids.def`
46+
are added to the end of the table when they are first encountered, and the
47+
same index is used for all instances.
48+
49+
See documentation at \ref dstringt.
50+
51+
\subsection typet_section typet
52+
53+
See \ref typet.
54+
55+
To be documented.
56+
57+
\subsubsection symbol_typet_section symbol_typet
58+
59+
See \ref symbol_typet.
60+
61+
To be documented.
62+
63+
\subsection exprt_section exprt
64+
65+
\ref exprt is the class to represent an expression. It inherits from \ref
66+
irept, and the only things it adds to it are that (1) every \ref exprt has
67+
an entry in [named_sub](\ref irept::dt::named_sub) containing its type and
68+
(2) everything in the [sub](\ref irept::dt::sub) of an \ref exprt is again an
69+
\ref exprt, not just an \ref irept. You can think of \ref exprt as a node in
70+
the abstract syntax tree for an expression. There are many classes that
71+
inherit from \ref exprt and which represent more specific things. For
72+
example, there is \ref minus_exprt, which has a [sub](\ref irept::dt::sub)
73+
of size 2 (for the two arguments of minus).
74+
75+
Recall that every \ref irept has one piece of data of its own, i.e. its
76+
[id()](\ref irept::id()), and all other information is in its
77+
[named_sub](\ref irept::dt::named_sub), [comments](\ref irept::dt::comments)
78+
or [sub](\ref irept::dt::sub). For [exprt](\ref exprt)s, the
79+
[id()](\ref irept::id()) is used to specify why kind of \ref exprt this is,
80+
so a \ref minus_exprt has `ID_minus` as its [id()](\ref irept::id()). This
81+
means that a \ref minus_exprt can be passed wherever an \ref exprt is
82+
expected, and if you want to check if the expression you are looking at is
83+
a minus expression then you have to check its [id()](\ref irept::id()) (or use
84+
[can_cast_expr](\ref can_cast_expr)`<minus_exprt>`).
85+
86+
\subsection codet_section codet
87+
88+
\ref exprt represents expressions and \ref codet represents statements.
89+
\ref codet inherits from \ref exprt, so all [codet](\ref codet)s are
90+
[exprt](\ref exprt)s, with [id()](\ref irept::id()) `ID_code`.
91+
Many different kinds of statements inherit from \ref codet, and they are
92+
distinguished by their [statement](\ref codet::get_statement()). For example,
93+
a while loop would be represented by a \ref code_whilet, which has
94+
[statement](\ref codet::get_statement()) `ID_while`. \ref code_whilet has
95+
two operands in its [sub](\ref irept::dt::sub), and helper functions to make
96+
it easier to use: [cond()](\ref code_whilet::cond()) returns the first
97+
subexpression, which is the condition for the while loop, and
98+
[body()](\ref code_whilet::body()) returns the second subexpression, which
99+
is the body of the while loop - this has to be a \ref codet, because the body
100+
of a while loop is a statement.
101+
102+
\subsection symbolt_section symbolt, symbol_tablet, and namespacet
103+
104+
To be documented.
105+
106+
\subsubsection symbol_lifetimes_section Symbol lifetimes, symbol modes, name, base-name, pretty-name; semantic of lifetimes for symex?
107+
108+
To be documented.
109+
110+
\subsubsection storing_symbols_section Storing symbols and hiding symbols (namespacet)
111+
112+
To be documented.
113+
114+
\subsubsection ns_follow_section ns.follow
115+
116+
To be documented.
24117

25118
\subsection cmdlinet
26119

0 commit comments

Comments
 (0)