Skip to content

Commit 8befd02

Browse files
author
Daniel Kroening
authored
Merge pull request diffblue#2238 from owen-jones-diffblue/owen-jones-diffblue/doc/irep_id
Add documentation of irep_idt and dstringt
2 parents 34b0ac6 + 01dc76b commit 8befd02

File tree

1 file changed

+23
-10
lines changed

1 file changed

+23
-10
lines changed

src/util/README.md

+23-10
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
# Folder util
55

6-
\author Martin Brain
6+
\author Martin Brain, Owen Jones
77

88
\section data_structures Data Structures
99

@@ -12,17 +12,17 @@ CPROVER codebase.
1212

1313
\subsection irept Irept Data Structure
1414

15-
There are a large number of kind of tree structured or tree-like data in
15+
There are a large number of kinds of tree structured or tree-like data in
1616
CPROVER. `irept` provides a single, unified representation for all of
1717
these, allowing structure sharing and reference counting of data. As
1818
such `irept` is the basic unit of data in CPROVER. Each `irept`
19-
contains[^2] a basic unit of data (of type `dt`) which contains four
19+
contains[^1] a basic unit of data (of type `dt`) which contains four
2020
things:
2121

22-
* `data`: A string[^3], which is returned when the `id()` function is
22+
* `data`: A string[^2], which is returned when the `id()` function is
2323
used.
2424

25-
* `named_sub`: A map from `irep_namet` (a string) to an `irept`. This
25+
* `named_sub`: A map from `irep_namet` (a string) to `irept`. This
2626
is used for named children, i.e. subexpressions, parameters, etc.
2727

2828
* `comments`: Another map from `irep_namet` to `irept` which is used
@@ -32,7 +32,7 @@ things:
3232
unnamed children.
3333

3434
The `irept::pretty` function outputs the contents of an `irept` directly
35-
and can be used to understand an debug problems with `irept`s.
35+
and can be used to understand and debug problems with `irept`s.
3636

3737
On their own `irept`s do not “mean” anything; they are effectively
3838
generic tree nodes. Their interpretation depends on the contents of
@@ -67,21 +67,34 @@ class’ named `typecast_exprt`. One key descendent of `exprt` is
6767
These are used to represent variables; the name of which can be found
6868
using the `get_identifier` accessor function.
6969

70-
`codet` inherits from `exprt` and is defined in `std_code.h`. They
71-
represent executable code; statements in C rather than expressions. In
70+
`codet` inherits from `exprt` and is defined in `std_code.h`. It
71+
represents executable code; statements in C rather than expressions. In
7272
the front-end there are versions of these that hold whole code blocks,
7373
but in goto-programs these have been flattened so that each `irept`
7474
represents one sequence point (almost one line of code / one
7575
semi-colon). The most common descendents of `codet` are `code_assignt`
7676
so a common pattern is to cast the `codet` to an assignment and then
7777
recurse on the expression on either side.
7878

79-
[^2]: Or references, if reference counted data sharing is enabled. It is
79+
[^1]: Or references, if reference counted data sharing is enabled. It is
8080
enabled by default; see the `SHARING` macro.
8181

82-
[^3]: Unless `USE_STD_STRING` is set, this is actually
82+
[^2]: Unless `USE_STD_STRING` is set, this is actually
8383
a `dstring` and thus an integer which is a reference into a string table
8484

85+
\subsection irep_idt Irep_idt and Dstringt
86+
87+
Inside `irept`s, strings are stored as `irep_idt`s, or `irep_namet`s for
88+
keys to `named_sub` or `comments`. If `USE_STD_STRING` is set then both
89+
`irep_idt` and `irep_namet` are `typedef`ed to `std::string`, but by default
90+
it is not set and they are `typedef`ed to `dstringt`. `dstringt` has one
91+
field, an unsigned integer which is an index into a static table of strings.
92+
This makes it expensive to create a new string (because you have to look
93+
through the whole table to see if it is already there, and add it if it
94+
isn't) but very cheap to compare strings (just compare the two integers). It
95+
also means that when you have lots of copies of the same string you only have
96+
to store the whole string once, which saves space.
97+
8598
\dot
8699
digraph G {
87100
node [shape=box];

0 commit comments

Comments
 (0)