Skip to content

Add a little more detail and discussion on irept #2769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 21, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 55 additions & 8 deletions src/util/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,72 @@

# Folder util

\author Martin Brain, Owen Jones
\author Martin Brain, Owen Jones, Chris Smowton

\section data_structures Data Structures

\ref util contains some of the key data-structures used in the
CPROVER codebase.

\subsection irept_section irept: a 4-tuple (data, named-sub, comments, sub)
\subsection irept_section irept: a general-purpose polymorphic tree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... polymorphic... in some place it should be mentioned that irept derived classes must not add any data members nor virtual methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


See documentation at \ref irept.
See detailed documentation at \ref irept.

As that documentation says, [irept](\ref irept)s are generic tree nodes. You
should think of them as having a single string ([data](irept::data), actually
an \ref irep_idt) and lots of child nodes, some of which are numbered
[irept](\ref irept)s are generic tree nodes. You
should think of each node as holding a single string ([data](irept::data),
actually an \ref irep_idt) and lots of child nodes, some of which are numbered
([sub](\ref irept::dt::sub)) and some of which are labelled, and the label
can either start with a “\#” ([comments](\ref irept::dt::comments)) or without
one ([named_sub](\ref irept::dt::named_sub)). The meaning of the “\#” is that
this child should not be considered important, for example it shouldn't be
counted when comparing two [irept](\ref irept)s for equality.
this child shouldn't be counted when comparing two [irept](\ref irept)s for
equality; this is usually used when making an advisory annotation which does
not alter the semantics of the program.

They are used to represent many kinds of structured objects throughout the
CPROVER codebase, such as expressions, types and code. An \ref exprt represents
expressions, including for example \ref equal_exprt, an equality predicate, or
\ref dereference_exprt, which represents the `*` unary dereference operator
found in C. A \ref typet represents a type, and may have other \ref typet nodes
as children: for example, \ref array_typet represents an array, and has a single
numbered child that gives the type of the array elements. Finally, \ref codet
represents imperative statements, such as \ref code_assignt, which represents an
imperative assignment. It has two numbered operands, its left- and right-hand
sides.

Note that \ref exprt, \ref codet and similar classes deriving from \ref irept
do so in order to add constraints to the general tree (for example, providing
accessor methods with user-friendly names, or enforcing invariants that a node
must have a certain number of children), but do not override virtual methods or
add fields.

The implementation of \ref irept allows saving memory by sharing instances of
its internal storage using a
[copy-on-write](https://en.wikipedia.org/wiki/Copy-on-write) scheme. For
example, the statement `irep1.sub()[0] = irep2;` will share rather than copy
`irep2` and its children, saving memory unless either irept is subsequently
modified, at which point a copy is made.

\subsection irept_discussion_section Discussion

Many other compiler infrastructures represent a polymorphic tree using nodes
specialised to a particular expression type: for example, perhaps a binary
addition operator could be represented using a tag plus two pointers to child
expressions, rather than allocating a whole irept (including a variable-length
expression vector and empty maps for storing named subexpressions). This may
save memory, but inhibits ad-hoc annotations such as tagging the addition
"does not overflow" without adding that field to addition operations globally
or maintaining a shadow data structure to track that information. This is easier
with a general irept structure that can store an arbitrary number of
arbitrarily-named child nodes.

Another optimisation commonly seen when storing polymorphic trees is to use a
uniform node data structure (like irept) but to keep the node compact, for
example storing at most four pointers and transparently allocating extension
nodes when necessary for an unusually large expression. This provides the best
of both worlds, obtaining compact storage for common cases such as unannotated
binary expressions while permitting deviation at times. The downside is that
implementing such a system is more complex than straightforwardly using C++
standard data structures as in irept.

\subsection irep_idt_section Strings: dstringt, the string_container and the ID_*

Expand Down