Skip to content

Doc/move irep docs from util to irep #2618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 8 additions & 81 deletions src/util/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,90 +10,17 @@
This section discusses some of the key data-structures used in the
CPROVER codebase.

\subsection irept Irept Data Structure
\subsection irept_section Irept Data Structure

There are a large number of kinds of tree structured or tree-like data in
CPROVER. `irept` provides a single, unified representation for all of
these, allowing structure sharing and reference counting of data. As
such `irept` is the basic unit of data in CPROVER. Each `irept`
contains[^1] a basic unit of data (of type `dt`) which contains four
things:
See \ref irept for more information.

* `data`: A string[^2], which is returned when the `id()` function is
used.
\subsection irep_idt_section Irep_idt and Dstringt

* `named_sub`: A map from `irep_namet` (a string) to `irept`. This
is used for named children, i.e. subexpressions, parameters, etc.

* `comments`: Another map from `irep_namet` to `irept` which is used
for annotations and other ‘non-semantic’ information

* `sub`: A vector of `irept` which is used to store ordered but
unnamed children.

The `irept::pretty` function outputs the contents of an `irept` directly
and can be used to understand and debug problems with `irept`s.

On their own `irept`s do not “mean” anything; they are effectively
generic tree nodes. Their interpretation depends on the contents of
result of the `id` function (the `data`) field. `util/irep_ids.txt`
contains the complete list of `id` values. During the build process it
is used to generate `util/irep_ids.h` which gives constants for each id
(named `ID_`). These can then be used to identify what kind of data
`irept` stores and thus what can be done with it.

To simplify this process, there are a variety of classes that inherit
from `irept`, roughly corresponding to the ids listed (i.e. `ID_or`
(the string `"or”`) corresponds to the class `or_exprt`). These give
semantically relevant accessor functions for the data; effectively
different APIs for the same underlying data structure. None of these
classes add fields (only methods) and so static casting can be used. The
inheritance graph of the subclasses of `irept` is a useful starting
point for working out how to manipulate data.

There are three main groups of classes (or APIs); those derived from
`typet`, `codet` and `exprt` respectively. Although all of these inherit
from `irept`, these are the most abstract level that code should handle
data. If code is manipulating plain `irept`s then something is wrong
with the architecture of the code.

Many of the key descendent of `exprt` are declared in `std_expr.h`. All
expressions have a named subfield / annotation which gives the type of
the expression (slightly simplified from C/C++ as `unsignedbv_typet`,
`signedbv_typet`, `floatbv_typet`, etc.). All type conversions are
explicit with an expression with `id() == ID_typecast` and an ‘interface
class’ named `typecast_exprt`. One key descendent of `exprt` is
`symbol_exprt` which creates `irept` instances with the id of “symbol”.
These are used to represent variables; the name of which can be found
using the `get_identifier` accessor function.

`codet` inherits from `exprt` and is defined in `std_code.h`. It
represents executable code; statements in C rather than expressions. In
the front-end there are versions of these that hold whole code blocks,
but in goto-programs these have been flattened so that each `irept`
represents one sequence point (almost one line of code / one
semi-colon). The most common descendents of `codet` are `code_assignt`
so a common pattern is to cast the `codet` to an assignment and then
recurse on the expression on either side.

[^1]: Or references, if reference counted data sharing is enabled. It is
enabled by default; see the `SHARING` macro.

[^2]: Unless `USE_STD_STRING` is set, this is actually
a `dstring` and thus an integer which is a reference into a string table

\subsection irep_idt Irep_idt and Dstringt

Inside `irept`s, strings are stored as `irep_idt`s, or `irep_namet`s for
keys to `named_sub` or `comments`. If `USE_STD_STRING` is set then both
`irep_idt` and `irep_namet` are `typedef`ed to `std::string`, but by default
it is not set and they are `typedef`ed to `dstringt`. `dstringt` has one
field, an unsigned integer which is an index into a static table of strings.
This makes it expensive to create a new string (because you have to look
through the whole table to see if it is already there, and add it if it
isn't) but very cheap to compare strings (just compare the two integers). It
also means that when you have lots of copies of the same string you only have
to store the whole string once, which saves space.
Inside \ref irept, strings are stored as irep_idts, or irep_namets for
keys to named_sub or comments. By default both irep_idt and irep_namet
are typedefed to \ref dstringt, unless USE_STD_STRING is set, in which
case they are typedefed to std::string (this can be used for debugging
purposes).

\dot
digraph G {
Expand Down
16 changes: 14 additions & 2 deletions src/util/dstring.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,20 @@ Author: Daniel Kroening, [email protected]

#include "string_container.h"

// Marked final to disable inheritance.
// No virtual destructor, so runtime-polymorphic use would be unsafe.
/// \ref dstringt has one field, an unsigned integer \ref no which is an index
/// into a static table of strings. This makes it expensive to create a new
/// string(because you have to look through the whole table to see if it is
/// already there, and add it if it isn't) but very cheap to compare strings
/// (just compare the two integers). It also means that when you have lots of
/// copies of the same string you only have to store the whole string once,
/// which saves space.
///
/// `irep_idt` and `irep_namet` are typedef-ed to \ref dstringt in irep.h unless
/// `USE_STD_STRING` is set.
///
///
/// Note: Marked final to disable inheritance. No virtual destructor, so
/// runtime-polymorphic use would be unsafe.
class dstringt final
{
public:
Expand Down
73 changes: 71 additions & 2 deletions src/util/irep.h
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,77 @@ inline const std::string &name2string(const irep_namet &n)
class irept;
const irept &get_nil_irep();

/*! \brief Base class for tree-like data structures with sharing
*/
/// \brief Base class for tree-like data structures with sharing
///
/// There are a large number of kinds of tree structured or tree-like data in
/// CPROVER. \ref irept provides a single, unified representation for all of
/// these, allowing structure sharing and reference counting of data. As
/// such \ref irept is the basic unit of data in CPROVER. Each \ref irept
/// contains (or references, if reference counted data sharing is enabled, as
/// it is by default - see the `SHARING` macro) a basic unit of data (of type
/// \ref dt) which contains four things:
///
/// * \ref irept::dt::data : A string, which is returned when the \ref id()
/// function is used. (Unless `USE_STD_STRING` is set, this is actually a
/// \ref dstringt and thus an integer which is a reference into a string
/// table.)
///
/// * \ref irept::dt::named_sub : A map from `irep_namet` (a string) to \ref
/// irept. This is used for named children, i.e. subexpressions, parameters,
/// etc.
///
/// * \ref irept::dt::comments : Another map from `irep_namet` to \ref irept
/// which is used for annotations and other ‘non-semantic’ information. Note
/// that this map is ignored by the default \ref operator==.
///
/// * \ref irept::dt::sub : A vector of \ref irept which is used to store
/// ordered but unnamed children.
///
/// The \ref irept::pretty function outputs the explicit tree structure of
/// an \ref irept and can be used to understand and debug problems with
/// `irept`s.
///
/// On their own `irept`s do not "mean" anything; they are effectively
/// generic tree nodes. Their interpretation depends on the contents of
/// result of the \ref id() function, i.e. the `data` field. `util/irep_ids.def`
/// contains a list of `id` values. During the build process it is used
/// to generate `util/irep_ids.h` which gives constants for each id
/// (named `ID_`). You can also make `irep_idt`s which do not come from
/// `util/irep_ids.def`. An `irep_idt` can then be used to identify what
/// kind of data the \ref irept stores and thus what can be done with it.
///
/// To simplify this process, there are a variety of classes that inherit
/// from \ref irept, roughly corresponding to the ids listed (i.e. `ID_or`
/// (the string "or”) corresponds to the class \ref or_exprt). These give
/// semantically relevant accessor functions for the data; effectively
/// different APIs for the same underlying data structure. None of these
/// classes add fields (only methods) and so static casting can be used. The
/// inheritance graph of the subclasses of \ref irept is a useful starting
/// point for working out how to manipulate data.
///
/// There are three main groups of classes (or APIs); those derived from
/// \ref typet, \ref codet and \ref exprt respectively. Although all of these
/// inherit from \ref irept, these are the most abstract level that code should
/// handle data. If code is manipulating plain `irept`s then something is wrong
/// with the architecture of the code.
///
/// Many of the key descendants of \ref exprt are declared in \ref std_expr.h.
/// All expressions have a named subexpression with ID "type", which gives the
/// type of the expression (slightly simplified from C/C++ as \ref
/// unsignedbv_typet, \ref signedbv_typet, \ref floatbv_typet, etc.). All type
/// conversions are explicit with a \ref typecast_exprt. One key descendant of
/// \ref exprt is \ref symbol_exprt which creates \ref irept instances with ID
/// “symbol”. These are used to represent variables; the name of which can be
/// found using the `get_identifier` accessor function.
///
/// \ref codet inherits from \ref exprt and is defined in `std_code.h`. It
/// represents executable code; statements in a C-like language rather than
/// expressions. In the front-end there are versions of these that hold
/// whole code blocks, but in goto-programs these have been flattened so
/// that each \ref irept represents one sequence point (almost one line of
/// code / one semi-colon). The most common descendant of \ref codet is
/// \ref code_assignt so a common pattern is to cast the \ref codet to an
/// assignment and then recurse on the expression on either side.
class irept
{
public:
Expand Down