Skip to content

Document symex instructions #7497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions doc/architectural/cbmc-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,10 @@ To be documented.

To be documented.

## Goto functions → BMC → Counterexample (trace) ##
## Goto functions → BMC → Counterexample (trace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Doesn't seem to match the other lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this as a drive-by clean up. As far as I know headings in markdown are only prepended by a ##.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second visit, it appears that doxygen does support any number of # after the title, but that's invalid markdown.

If this change offends in any capacity however, it's easy to be reverted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this file has that issue multiple times. Can you please clean up all occurrences? (src/goto-instrument/README.md also suffers from this problem. Maybe do one commit that does the cleanup for all of these?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean fair enough. I was thinking maybe there should be a separate PR that does this clean up. Or, at least, maybe consistently clean them all up in this file as there are many line this (as Michael as noticed). You want to leave the file in a consistent state so that newcomers will follow the example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make trailing # consistent by removing them in the files touched by this PR.


To be documented.
For an explanation of part of how the BMC (Symex) process works, please refer
to [Symex and GOTO program instructions](symex-instructions.md)

## Trace → interpreter → memory map ##

Expand Down
127 changes: 127 additions & 0 deletions doc/architectural/symex-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Symex and GOTO program instructions

In [doc/central-data-structures](central-data-structures.md) we talked about
the relationship between various central data structures of CBMC.

We identified the `goto_programt` as being the core data structure representing
GOTO-IR, which we defined as a list of GOTO program instructions.

As a reminder, instructions are composed of three subcomponents:

* An instruction type,
* A statement (denoting the actual code the instruction contains),
* An instruction guard (an expression that needs to be evaluated to `true` before
the instruction is to be executed - if one is attached to the instruction).

In this document, we are going to take a closer look at the first subcomponent,
the instruction types, along with the interplay between the instructions and a
central CBMC module, the *symbolic execution engine* (from now on, just *symex*).

## A (very) short introduction to Symex

Symex is, at its core, a GOTO-program interpreter. While Symex is interpreting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'an interpreter BUT using symbolic values, not literal ones. For example... Note that this produces a formula which describes all possible outputs rather than a single output value.'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Changed

the program, it's also building a list of SSA steps that form part of the equation
that is to be sent to the solver.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to refer the reader to src/goto-symex/README.md?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added


You can see the main instruction dispatcher (what corresponds to the main interpreter
loop) at goto_symext::execute_next_instruction.

Symex's source code is available under [src/goto-symex](../../src/goto-symex/).

## Instruction Types

Every GOTO-program instruction has a specific type. You can see the instruction type
at #goto_program_instruction_typet but we will also list some of them here for illustration
purposes:

```c
enum goto_program_instruction_typet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw in some "..."s here to emphasize that this isn't the full set.

{
GOTO = 1, // branch, possibly guarded
ASSUME = 2, // non-failing guarded self loop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps not the worlds most obvious description of an assumption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is that it's a copy from goto_program_instruction_typet where the comment of ASSUME states what was written.

I changed it to ASSUME to avoid confusion.

ASSERT = 3, // assertions
SKIP = 5, // just advance the PC
SET_RETURN_VALUE = 12, // set function return value (no control-flow change)
ASSIGN = 13, // assignment lhs:=rhs
DECL = 14, // declare a local variable
DEAD = 15, // marks the end-of-live of a local variable
FUNCTION_CALL = 16, // call a function
};
```

(*NOTE*: The above is for illustration purposes only - the list is not complete, neither is it
expected to be kept up-to-date. Please refer to the type #goto_program_instruction_typet for a
list of what the instructions look like at this point in time.)

You can observe these instruction types in the user-friendly print of the goto-program that
various CPROVER programs produce with the flag `--show-goto-functions`. For a live example,
consider the following C file:

```c
int main(int argc, char **argv)
{
int arry[] = {0, 1, 2, 3};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this point you may as well just say "array", or go the whole way and call it "a". "arry" offers neither brevity nor clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a and improved assertion comment

__CPROVER_assert(arry[3] != 3, "expected failure: last arry element is equal to 3");
}
```

If we ask CBMC to print the GOTO-program instructions with `--show-goto-functions`, then part
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the goto-functions documentation rather than just in symex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume (hope) it is. Maybe here we should have a cross reference to that part of the goto-functions documentation.

of the output looks like this:

```c
[...]

main /* main */
// 0 file /tmp/example.c line 3 function main
DECL main::1::arry : signedbv[32][4]
// 1 file /tmp/example.c line 3 function main
ASSIGN main::1::arry := { 0, 1, 2, 3 }
// 2 file /tmp/example.c line 4 function main
ASSERT main::1::arry[cast(3, signedbv[64])] ≠ 3 // expected failure: last arry element is equal to 3
// 3 file /tmp/example.c line 5 function main
DEAD main::1::arry
// 4 file /tmp/example.c line 5 function main
SET RETURN VALUE side_effect statement="nondet" is_nondet_nullable="1"
// 5 file /tmp/example.c line 5 function main
END_FUNCTION
```

In the above excerpt, the capitalised words at the beginning each instruction are the
correspondent instruction types (`DECL`, `ASSIGN`, etc).
Comment on lines +88 to +94
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the above really belongs to goto programs, not so much to symex, does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in this file we are describing the Instruction type that is exactly what we are showing.
In my opinion it is useful to show how the goto-program looks like.


---

Symex (as mentioned above) is going to pick a designated entry point and then start going through
each instruction. This happens at goto_symext::execute_next_instruction. While doing so, it will
inspect the instruction's type, and then dispatch to a designated handling function (which usually go
by the name `symex_<instruction-type>`) to handle that particular instruction type and its symbolic
execution. In pseudocode, it looks like this:

```c
switch(instruction.type())
{
[...]

case GOTO:
symex_goto(state);
break;

case ASSUME:
symex_assume(state, instruction.condition());
break;
```

The way the [`symex` subfolder](../../src/goto-symex/) is structured, the different
dispatching functions are usually in their own file, designated by the instruction's
name. As an example, you can find the code for the function goto_symext::symex_goto
in [symex_goto.cpp](../../src/goto-symex/symex_goto.cpp)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A description of what symex produces, why, and what it means could use a longer explanation.

The output of symex is an symex_target_equationt which consists of equalities between
different expressions in the program. You can visualise it as a data structure that
serialises to this: `(a = 5 ∨ a = 3) ∧ (b = 3) ∧ (c = 4) ∧ ...` that describe assignments
and conditions for a range of possible executions of a program that cover a range of
potential paths within it.

This is a high-level overview of symex and goto-program instructions.
For more information (and a more robust introduction), please have a look
at \ref symbolic-execution.