diff --git a/.github/workflows/publish.yaml b/.github/workflows/publish.yaml
index 5580c714f29..f1f48c727bc 100644
--- a/.github/workflows/publish.yaml
+++ b/.github/workflows/publish.yaml
@@ -10,11 +10,14 @@ jobs:
         uses: actions/checkout@v3
 
       - name: Install doxygen
-        run: sudo apt install doxygen graphviz pandoc
+        run: sudo apt install doxygen graphviz pandoc npm
 
       - name: Install python modules
         run: sudo python3 -m pip install gitpython pandocfilters
 
+      - name: Install mermaid diagram filter
+        run: sudo npm install --global mermaid-filter
+
       - name: Build documentation
         run: cd doc/doxygen-root && make && touch html/.nojekyll
 
diff --git a/doc/architectural/cbmc-architecture.md b/doc/architectural/cbmc-architecture.md
index 89cd8eb7a78..0ac5db5d759 100644
--- a/doc/architectural/cbmc-architecture.md
+++ b/doc/architectural/cbmc-architecture.md
@@ -55,7 +55,14 @@ tools process this format, either directly from the front-end or from
 it’s saved output. These include a wide range of analysis and
 transformation tools (see \ref other-tools).
 
-# Concepts #
+# Concepts
+
+## Central data structures
+
+For an explanation of the data structures involved in the modeling of a GOTO
+program (the GOTO Intermediate Representation used by CBMC and assorted tools)
+please see \subpage central-data-structures .
+
 ## {C, java bytecode} &rarr; Parse tree &rarr; Symbol table &rarr; GOTO programs &rarr; GOTO program transformations &rarr; BMC &rarr; counterexample (goto_tracet) &rarr; printing ##
 
 To be documented.
diff --git a/doc/architectural/central-data-structures.md b/doc/architectural/central-data-structures.md
new file mode 100644
index 00000000000..c0869158576
--- /dev/null
+++ b/doc/architectural/central-data-structures.md
@@ -0,0 +1,185 @@
+\ingroup module_hidden
+
+\page central-data-structures Central Data Structures
+
+# Central Data Structures
+
+The following is some light technical documentation of the major data structures
+used in the input transformation to Intermediate Representation (IR) inside
+CBMC and the assorted CProver tools.
+
+## GOTO models
+
+The `goto_modelt` class is the top-level data structure that CBMC (and the other
+tools) use for holding the GOTO intermediate representation. The following
+diagram is a simplified view of how the data structures relate to each other -
+
+```mermaid
+erDiagram
+  goto_modelt {
+    symbol_tablet symbol_table
+    goto_functionst goto_functions
+  }
+  goto_modelt ||--|| symbol_tablet : ""
+  goto_modelt ||--|| goto_functionst : ""
+  symbol_tablet {
+    symbol_map symbols
+  }
+  symbol_tablet ||--o{ symbolt : ""
+  symbolt {
+    string name
+  }
+  goto_functionst {
+    function_map functions
+  }
+  goto_functionst ||--o{ goto_functiont : ""
+  goto_functiont {
+    goto_programt body
+    ordered_collection_of parameter_identifiers
+  }
+  goto_functiont ||--|| goto_programt : ""
+  goto_programt {
+    ordered_collection_of instructions
+  }
+  goto_programt ||--o{ instructiont : ""
+  instructiont {
+    enumeration instruction_type
+    goto_program_codet code
+    boolean_expression guard
+    source_locationt source_location
+  }
+  instructiont ||--|| goto_program_instruction_typet : ""
+  instructiont ||--o| goto_program_codet : ""
+  instructiont ||--o| source_locationt : ""
+  goto_program_codet ||--o| source_locationt : ""
+```
+
+A `goto_modelt` is effectively a pair, consisting of:
+
+* A collection of GOTO functions.
+* A symbol table containing symbol definitions which may be referred to by
+  symbol expressions. Symbol expressions are found in the goto functions and the
+  (sub-)expressions for the definitions of symbols.
+
+In pseudocode, the type looks this:
+
+```js
+type goto_modelt {
+  type goto_functionst = map<identifier, goto_functiont>
+  type symbol_tablet = map<identifier, symbolt>
+}
+```
+
+There is an abstract interface for goto models provided by the
+`abstract_goto_modelt` class. This is defined and documented in the file
+[`src/goto-programs/abstract_goto_model.h`](../../src/goto-programs/abstract_goto_model.h).
+Ideally code which takes a goto model as input should accept any implementation
+of the abstract interface rather than accepting the concrete `goto_modelt`
+exclusively. This helps with compatibility with `jbmc` which uses lazy loading.
+See the `lazy_goto_modelt` class which is defined and documented in
+[`jbmc/src/java_bytecode/lazy_goto_model.h`](../../jbmc/src/java_bytecode/lazy_goto_model.h)
+for details of lazy loading.
+
+For further information about symbols see the `symbolt` class which is defined
+and documented in [`src/util/symbol.h`](../../src/util/symbol.h) and the
+`symbol_exprt` (symbol expression) class which is defined and documented in
+[`src/util/std_expr.h`](../../src/util/std_expr.h).
+
+## goto_functiont
+
+A `goto_functiont` is also defined as a pair. It's designed to represent a function
+at the goto level, and effectively it's the following data type (in pseudocode):
+
+```js
+type goto_functiont {
+    goto_programt body
+    list<identifiers> parameters
+}
+```
+
+The `goto_programt` denoting the `body` of the function will be the subject of
+a more elaborate explanation in the next section. The in-memory structure of a
+goto function allows the `body` to be optional, but only functions with bodies
+are included in the serialised goto binaries.
+
+The `parameters` subcomponent is a list of identifiers that are to be looked-up
+in the symbol-table for their definitions.
+
+## goto_programt
+
+A  goto program  is a sequence of GOTO instructions (`goto_instructiont`). For
+details of the `goto_programt` class itself see the documentation in
+[`goto_program.h`](../../src/goto-programs/goto_program.h). For further details
+of the life cycle of goto programs see
+[`src/goto-programs/README.md`](https://github.com/diffblue/cbmc/blob/develop/src/goto-programs/README.md).
+
+An instruction (`goto_instructiont`) is a triple (an element with three subcomponents),
+describing a GOTO level instruction with the following 3 component subtypes,
+again in pseudocode:
+
+```js
+type goto_instructiont {
+    instruction_type instr_type
+    instruction      statement
+    guard            boolean_expr
+}
+```
+
+The pseudocode subtypes above require some more elaboration, so we will provide some extra
+detail to illuminate some details at the code level:
+
+* The `instruction_type` above corresponds to the `goto_program_instruction_typet` type
+  listed in [`goto_program.h`](../../src/goto-programs/goto_program.h)
+  * Some illustrative instruction types are `assign`, `function_call`, `return`, etc.
+* The `instruction` is a statement represented by a `goto_instruction_codet`.
+  * A `goto_instruction_codet` is an alias of `codet` (a data structure broadly representing
+    a statement inside CBMC) that contains the actual code to be executed.
+  * You can distinguish different statements by using the result of the
+    `get_statement` member function of the `codet` class.
+* The `guard` is an `exprt` (a data structure broadly representing an expression inside CBMC)
+  that is expected to have type `bool`.
+  * This is optional - not every instruction is expected to have a `guard` associated with it.
+
+## source_locationt
+
+Another important data structure that needs to be discussed at this point is
+`source_locationt`.
+
+`source_locationt` are attached into various `exprt`s (the data structure representing
+various expressions, usually the result of some early processing, e.g. the result of the
+frontend parsing a file).
+
+This means that `codet`s, representing GOTO instructions *also* have a `source_locationt`
+attached to them, by virtue of inheriting from `exprt`.
+
+For the most part, `source_locationt` is something that is only being used when we print
+various nodes (for property listing, counterexample/trace printing, etc).
+
+It might be possible that a specific source location might point to a CBMC instrumentation
+primitive (which might be reported as a `built-in-addition`) or there might even be no-source
+location (because it might be part of harnesses generated as an example, that have no presence
+in the user code).
+
+## irept
+
+`irep`s are the underlying data structure used to implement many of the data
+structures in CBMC and the assorted tools. These include the `exprt`, `typet`,
+`codet` and `source_locationt` classes. This is a tree data structure where
+each node is expected to contain a string/ID and may have child nodes stored in
+both a sequence of child nodes and a map of strings/IDs to child nodes. This
+enables the singular `irept` data structure to be used to model graphs such as
+ASTs, CFGs, etc.
+
+The classes extending `irept` define how the higher level concepts are mapped
+onto the underlying tree data structure. For this reason it is usually advised
+that the member functions of the sub-classes should be used to access the data
+held, rather than the member functions of the base `irept` class. This aids
+potential future restructuring and associates accesses of the data with the
+member functions which have the doxygen explaining what the data is.
+
+The strings/IDs held within `irept` are of type `irep_idt`. These can be
+converted to `std::string` using the `id2string` function. There is a mechanism
+provided for casting expressions and types in
+[`src/util/expr_cast.h`](../../src/util/expr_cast.h). In depth documentation
+of the `irept` class itself can be found in
+[`src/util/irep.h`](../../src/util/irep.h).
diff --git a/doc/doxygen-root/doxygen-markdown/markdown.sh b/doc/doxygen-root/doxygen-markdown/markdown.sh
index 943c39bd82c..0f8063c7088 100755
--- a/doc/doxygen-root/doxygen-markdown/markdown.sh
+++ b/doc/doxygen-root/doxygen-markdown/markdown.sh
@@ -34,6 +34,21 @@ $BINDIR/append-last-modified-dates.py $FILES
 #         $BINDIR/pandoc-codeblock-repair.sh > $file
 # done
 
+echo
+echo "Rendering mermaid diagrams"
+FILES=$(grep --include=\*.md -rl . -e "\`\`\`mermaid")
+for file in $FILES; do
+    echo $file
+    tmp=/tmp/${file%.*}2.md
+    mkdir -p $(dirname $tmp)
+    cp $file $tmp
+    # Note that gfm is GitHub Flavour Markdown. The double \\ at the start of
+    # lines is unescaped using sed in order to fix doxygen tags, which are
+    # broken by pandoc.
+    pandoc --read=gfm --write=gfm --wrap=none --filter=mermaid-filter $tmp |
+        $BINDIR/pandoc-codeblock-repair.sh | sed 's/^\\\\/\\/' > $file
+done
+
 cprovers=$(find . -name cprover-manual)
 cprover=${cprovers[0]}
 
@@ -49,7 +64,7 @@ echo "Running pandoc filter over cprover-manual markdown files"
 FILES=$(find $cprover -name '*.md')
 for file in $FILES; do
     echo $file
-    tmp=/tmp/${file%.*}2.md
+    tmp=/tmp/${file%.*}3.md
     mkdir -p $(dirname $tmp)
     cp $file $tmp
     pandoc  --write=markdown_phpextra --wrap=none --filter=$BINDIR/pandoc-cprover-link-filter.py $tmp |