Skip to content

Document symex instructions #7497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions doc/architectural/cbmc-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,40 +63,41 @@ For an explanation of the data structures involved in the modeling of a GOTO
program (the GOTO Intermediate Representation used by CBMC and assorted tools)
please see \subpage central-data-structures .

## {C, java bytecode} → Parse tree → Symbol table → GOTO programs → GOTO program transformations → BMC → counterexample (goto_tracet) → printing ##
## {C, java bytecode} → Parse tree → Symbol table → GOTO programs → GOTO program transformations → BMC → counterexample (goto_tracet) → printing

To be documented.

## Instrumentation: goto functions → goto functions ##
## Instrumentation: goto functions → goto functions

To be documented.

## Goto functions → BMC → Counterexample (trace) ##
## Goto functions → BMC → Counterexample (trace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Doesn't seem to match the other lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this as a drive-by clean up. As far as I know headings in markdown are only prepended by a ##.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second visit, it appears that doxygen does support any number of # after the title, but that's invalid markdown.

If this change offends in any capacity however, it's easy to be reverted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this file has that issue multiple times. Can you please clean up all occurrences? (src/goto-instrument/README.md also suffers from this problem. Maybe do one commit that does the cleanup for all of these?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean fair enough. I was thinking maybe there should be a separate PR that does this clean up. Or, at least, maybe consistently clean them all up in this file as there are many line this (as Michael as noticed). You want to leave the file in a consistent state so that newcomers will follow the example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make trailing # consistent by removing them in the files touched by this PR.


To be documented.
For an explanation of part of how the BMC (Symex) process works, please refer
to [Symex and GOTO program instructions](symex-instructions.md)

## Trace → interpreter → memory map ##
## Trace → interpreter → memory map

To be documented.

## Goto functions → abstract interpretation ##
## Goto functions → abstract interpretation

To be documented.

## Executables (flow of transformations): ##
## Executables (flow of transformations):

### goto-cc ###
### goto-cc

To be documented.

### goto-instrument ###
### goto-instrument

To be documented.

### cbmc ###
### cbmc

To be documented.

### goto-analyzer ###
### goto-analyzer

To be documented.
131 changes: 131 additions & 0 deletions doc/architectural/symex-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Symex and GOTO program instructions

In [doc/central-data-structures](central-data-structures.md) we talked about
the relationship between various central data structures of CBMC.

We identified the `goto_programt` as being the core data structure representing
GOTO-IR, which we defined as a list of GOTO program instructions.

As a reminder, instructions are composed of three subcomponents:

* An instruction type,
* A statement (denoting the actual code the instruction contains),
* An instruction guard (an expression that needs to be evaluated to `true` before
the instruction is to be executed - if one is attached to the instruction).

In this document, we are going to take a closer look at the first subcomponent,
the instruction types, along with the interplay between the instructions and a
central CBMC module, the *symbolic execution engine* (from now on, just *symex*).

## A (very) short introduction to Symex

Symex is, at its core, a GOTO-program interpreter that uses symbolic values instead of actual ones.
This produces a formula which describes all possible outputs rather than a single output value.
While Symex is interpreting the program, it also builds a list of Static Single Assignment (SSA)
steps that form part of the equation that is to be sent to the solver. For more information see
[src/goto-symex](../../src/goto-symex/README.md).

You can see the main instruction dispatcher (what corresponds to the main interpreter
loop) at `goto_symext::execute_next_instruction`.

Symex's source code is available under [src/goto-symex](../../src/goto-symex/).

## Instruction Types

Every GOTO-program instruction has a specific type. You can see the instruction type
at #goto_program_instruction_typet but we will also list some of them here for illustration
purposes:

```c
enum goto_program_instruction_typet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw in some "..."s here to emphasize that this isn't the full set.

{
[...]
GOTO = 1, // branch, possibly guarded
ASSUME = 2, // assumption
ASSERT = 3, // assertions
SKIP = 5, // just advance the PC
SET_RETURN_VALUE = 12, // set function return value (no control-flow change)
ASSIGN = 13, // assignment lhs:=rhs
DECL = 14, // declare a local variable
DEAD = 15, // marks the end-of-live of a local variable
FUNCTION_CALL = 16, // call a function
[...]
};
```

(*NOTE*: The above is for illustration purposes only - the list is not complete, neither is it
expected to be kept up-to-date. Please refer to the type #goto_program_instruction_typet for a
list of what the instructions look like at this point in time.)

You can observe these instruction types in the user-friendly print of the goto-program that
various CPROVER programs produce with the flag `--show-goto-functions`. For a live example,
consider the following C file:

```c
int main(int argc, char **argv)
{
int a[] = {0, 1, 2, 3};
__CPROVER_assert(a[3] != 3, "expected failure: last element of array 'a' is equal to 3");
}
```

If we ask CBMC to print the GOTO-program instructions with `--show-goto-functions`, then part
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the goto-functions documentation rather than just in symex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume (hope) it is. Maybe here we should have a cross reference to that part of the goto-functions documentation.

of the output looks like this:

```c
[...]

main /* main */
// 0 file /tmp/example.c line 3 function main
DECL main::1::arry : signedbv[32][4]
// 1 file /tmp/example.c line 3 function main
ASSIGN main::1::arry := { 0, 1, 2, 3 }
// 2 file /tmp/example.c line 4 function main
ASSERT main::1::arry[cast(3, signedbv[64])] ≠ 3 // expected failure: last arry element is equal to 3
// 3 file /tmp/example.c line 5 function main
DEAD main::1::arry
// 4 file /tmp/example.c line 5 function main
SET RETURN VALUE side_effect statement="nondet" is_nondet_nullable="1"
// 5 file /tmp/example.c line 5 function main
END_FUNCTION
```

In the above excerpt, the capitalised words at the beginning each instruction are the
correspondent instruction types (`DECL`, `ASSIGN`, etc).
Comment on lines +88 to +94
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the above really belongs to goto programs, not so much to symex, does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in this file we are describing the Instruction type that is exactly what we are showing.
In my opinion it is useful to show how the goto-program looks like.


---

Symex (as mentioned above) is going to pick a designated entry point and then start going through
each instruction. This happens at `goto_symext::execute_next_instruction`. While doing so, it will
inspect the instruction's type, and then dispatch to a designated handling function (which usually
go by the name `symex_<instruction-type>`) to handle that particular instruction type and its
symbolic execution. In pseudocode, it looks like this:

```c
switch(instruction.type())
{
[...]

case GOTO:
symex_goto(state);
break;

case ASSUME:
symex_assume(state, instruction.condition());
break;
```

The way the [`symex` subfolder](../../src/goto-symex/) is structured, the different
dispatching functions are usually in their own file, designated by the instruction's
name. As an example, you can find the code for the function goto_symext::symex_goto
in [symex_goto.cpp](../../src/goto-symex/symex_goto.cpp)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A description of what symex produces, why, and what it means could use a longer explanation.

The output of symex is an symex_target_equationt which consists of equalities between
different expressions in the program. You can visualise it as a data structure that
serialises to this: `(a = 5 ∨ a = 3) ∧ (b = 3) ∧ (c = 4) ∧ ...` that describe assignments
and conditions for a range of possible executions of a program that cover a range of
potential paths within it.

This is a high-level overview of symex and goto-program instructions.
For more information (and a more robust introduction), please have a look
at \ref symbolic-execution.