Merge pull request #2618 from owen-jones-diffblue/doc/move-irep-docs-from-util-to-irep

owen-jones-diffblue · web-flow · commit 7c066f91987d · 2018-08-01T10:17:27.000+01:00
move irept docs from util to irept and update them
diff --git a/src/util/README.md b/src/util/README.md
@@ -10,90 +10,17 @@
 This section discusses some of the key data-structures used in the
 CPROVER codebase.
 
-\subsection irept Irept Data Structure
+\subsection irept_section Irept Data Structure
 
-There are a large number of kinds of tree structured or tree-like data in
-CPROVER. `irept` provides a single, unified representation for all of
-these, allowing structure sharing and reference counting of data. As
-such `irept` is the basic unit of data in CPROVER.  Each `irept`
-contains[^1] a basic unit of data (of type `dt`) which contains four
-things:
+See \ref irept for more information.
 
-* `data`:   A string[^2], which is returned when the `id()` function is
-  used.
+\subsection irep_idt_section Irep_idt and Dstringt
 
-* `named_sub`:   A map from `irep_namet` (a string) to `irept`. This
-  is used for named children, i.e.  subexpressions, parameters, etc.
-
-* `comments`:   Another map from `irep_namet` to `irept` which is used
-  for annotations and other ‘non-semantic’ information
-
-* `sub`:   A vector of `irept` which is used to store ordered but
-  unnamed children.
-
-The `irept::pretty` function outputs the contents of an `irept` directly
-and can be used to understand and debug problems with `irept`s.
-
-On their own `irept`s do not “mean” anything; they are effectively
-generic tree nodes. Their interpretation depends on the contents of
-result of the `id` function (the `data`) field. `util/irep_ids.txt`
-contains the complete list of `id` values. During the build process it
-is used to generate `util/irep_ids.h` which gives constants for each id
-(named `ID_`). These can then be used to identify what kind of data
-`irept` stores and thus what can be done with it.
-
-To simplify this process, there are a variety of classes that inherit
-from `irept`, roughly corresponding to the ids listed (i.e.  `ID_or`
-(the string `"or”`) corresponds to the class `or_exprt`). These give
-semantically relevant accessor functions for the data; effectively
-different APIs for the same underlying data structure. None of these
-classes add fields (only methods) and so static casting can be used. The
-inheritance graph of the subclasses of `irept` is a useful starting
-point for working out how to manipulate data.
-
-There are three main groups of classes (or APIs); those derived from
-`typet`, `codet` and `exprt` respectively. Although all of these inherit
-from `irept`, these are the most abstract level that code should handle
-data. If code is manipulating plain `irept`s then something is wrong
-with the architecture of the code.
-
-Many of the key descendent of `exprt` are declared in `std_expr.h`. All
-expressions have a named subfield / annotation which gives the type of
-the expression (slightly simplified from C/C++ as `unsignedbv_typet`,
-`signedbv_typet`, `floatbv_typet`, etc.). All type conversions are
-explicit with an expression with `id() == ID_typecast` and an ‘interface
-class’ named `typecast_exprt`. One key descendent of `exprt` is
-`symbol_exprt` which creates `irept` instances with the id of “symbol”.
-These are used to represent variables; the name of which can be found
-using the `get_identifier` accessor function.
-
-`codet` inherits from `exprt` and is defined in `std_code.h`. It
-represents executable code; statements in C rather than expressions. In
-the front-end there are versions of these that hold whole code blocks,
-but in goto-programs these have been flattened so that each `irept`
-represents one sequence point (almost one line of code / one
-semi-colon). The most common descendents of `codet` are `code_assignt`
-so a common pattern is to cast the `codet` to an assignment and then
-recurse on the expression on either side.
-
-[^1]: Or references, if reference counted data sharing is enabled. It is
-    enabled by default; see the `SHARING` macro.
-
-[^2]: Unless `USE_STD_STRING` is set, this is actually
-a `dstring` and thus an integer which is a reference into a string table
-
-\subsection irep_idt Irep_idt and Dstringt
-
-Inside `irept`s, strings are stored as `irep_idt`s, or `irep_namet`s for
-keys to `named_sub` or `comments`. If `USE_STD_STRING` is set then both 
-`irep_idt` and `irep_namet` are `typedef`ed to `std::string`, but by default 
-it is not set and they are `typedef`ed to `dstringt`. `dstringt` has one 
-field, an unsigned integer which is an index into a static table of strings. 
-This makes it expensive to create a new string (because you have to look 
-through the whole table to see if it is already there, and add it if it 
-isn't) but very cheap to compare strings (just compare the two integers). It 
-also means that when you have lots of copies of the same string you only have
-to store the whole string once, which saves space.
+Inside \ref irept, strings are stored as irep_idts, or irep_namets for
+keys to named_sub or comments. By default both irep_idt and irep_namet
+are typedefed to \ref dstringt, unless USE_STD_STRING is set, in which
+case they are typedefed to std::string (this can be used for debugging
+purposes).
 
 \dot
 digraph G {
diff --git a/src/util/dstring.h b/src/util/dstring.h
@@ -16,8 +16,20 @@ Author: Daniel Kroening, kroening@kroening.com
 
 #include "string_container.h"
 
-// Marked final to disable inheritance.
-// No virtual destructor, so runtime-polymorphic use would be unsafe.
+/// \ref dstringt has one field, an unsigned integer \ref no which is an index
+/// into a static table of strings. This makes it expensive to create a new
+/// string(because you have to look through the whole table to see if it is
+/// already there, and add it if it isn't) but very cheap to compare strings
+/// (just compare the two integers). It also means that when you have lots of
+/// copies of the same string you only have to store the whole string once,
+/// which saves space.
+///
+/// `irep_idt` and `irep_namet` are typedef-ed to \ref dstringt in irep.h unless
+/// `USE_STD_STRING` is set.
+///
+///
+/// Note: Marked final to disable inheritance. No virtual destructor, so
+/// runtime-polymorphic use would be unsafe.
 class dstringt final
 {
 public:
diff --git a/src/util/irep.h b/src/util/irep.h
@@ -82,8 +82,77 @@ inline const std::string &name2string(const irep_namet &n)
 class irept;
 const irept &get_nil_irep();
 
-/*! \brief Base class for tree-like data structures with sharing
-*/
+/// \brief Base class for tree-like data structures with sharing
+///
+/// There are a large number of kinds of tree structured or tree-like data in
+/// CPROVER. \ref irept provides a single, unified representation for all of
+/// these, allowing structure sharing and reference counting of data. As
+/// such \ref irept is the basic unit of data in CPROVER.  Each \ref irept
+/// contains (or references, if reference counted data sharing is enabled, as
+/// it is by default - see the `SHARING` macro) a basic unit of data (of type
+/// \ref dt) which contains four things:
+///
+/// * \ref irept::dt::data : A string, which is returned when the \ref id()
+///   function is used. (Unless `USE_STD_STRING` is set, this is actually a
+///   \ref dstringt and thus an integer which is a reference into a string
+///   table.)
+///
+/// * \ref irept::dt::named_sub : A map from `irep_namet` (a string) to \ref
+///   irept. This is used for named children, i.e.  subexpressions, parameters,
+///   etc.
+///
+/// * \ref irept::dt::comments : Another map from `irep_namet` to \ref irept
+///   which is used for annotations and other ‘non-semantic’ information. Note
+///   that this map is ignored by the default \ref operator==.
+///
+/// * \ref irept::dt::sub : A vector of \ref irept which is used to store
+///   ordered but unnamed children.
+///
+/// The \ref irept::pretty function outputs the explicit tree structure of
+/// an \ref irept and can be used to understand and debug problems with
+/// `irept`s.
+///
+/// On their own `irept`s do not "mean" anything; they are effectively
+/// generic tree nodes. Their interpretation depends on the contents of
+/// result of the \ref id() function, i.e. the `data` field. `util/irep_ids.def`
+/// contains a list of `id` values. During the build process it is used
+/// to generate `util/irep_ids.h` which gives constants for each id
+/// (named `ID_`). You can also make `irep_idt`s which do not come from
+/// `util/irep_ids.def`. An `irep_idt` can then be used to identify what
+/// kind of data the \ref irept stores and thus what can be done with it.
+///
+/// To simplify this process, there are a variety of classes that inherit
+/// from \ref irept, roughly corresponding to the ids listed (i.e. `ID_or`
+/// (the string "or”) corresponds to the class \ref or_exprt). These give
+/// semantically relevant accessor functions for the data; effectively
+/// different APIs for the same underlying data structure. None of these
+/// classes add fields (only methods) and so static casting can be used. The
+/// inheritance graph of the subclasses of \ref irept is a useful starting
+/// point for working out how to manipulate data.
+///
+/// There are three main groups of classes (or APIs); those derived from
+/// \ref typet, \ref codet and \ref exprt respectively. Although all of these
+/// inherit from \ref irept, these are the most abstract level that code should
+/// handle data. If code is manipulating plain `irept`s then something is wrong
+/// with the architecture of the code.
+///
+/// Many of the key descendants of \ref exprt are declared in \ref std_expr.h.
+/// All expressions have a named subexpression with ID "type", which gives the
+/// type of the expression (slightly simplified from C/C++ as \ref
+/// unsignedbv_typet, \ref signedbv_typet, \ref floatbv_typet, etc.). All type
+/// conversions are explicit with a \ref typecast_exprt. One key descendant of
+/// \ref exprt is \ref symbol_exprt which creates \ref irept instances with ID
+/// “symbol”. These are used to represent variables; the name of which can be
+/// found using the `get_identifier` accessor function.
+///
+/// \ref codet inherits from \ref exprt and is defined in `std_code.h`. It
+/// represents executable code; statements in a C-like language rather than
+/// expressions. In the front-end there are versions of these that hold
+/// whole code blocks, but in goto-programs these have been flattened so
+/// that each \ref irept represents one sequence point (almost one line of
+/// code / one semi-colon). The most common descendant of \ref codet is
+/// \ref code_assignt so a common pattern is to cast the \ref codet to an
+/// assignment and then recurse on the expression on either side.
 class irept
 {
 public: