Skip to content

Proper handling of library and __CPROVER built-in functions in contracts #7197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
feliperodri opened this issue Oct 5, 2022 · 0 comments · Fixed by #7242
Closed

Proper handling of library and __CPROVER built-in functions in contracts #7197

feliperodri opened this issue Oct 5, 2022 · 0 comments · Fixed by #7242
Assignees
Labels
aws Bugs or features of importance to AWS CBMC users bug Code Contracts Function and loop contracts

Comments

@feliperodri
Copy link
Collaborator

CBMC version: 5.67.0
Operating system: N/A

These classes of functions need special care during contract instrumentation:

  • Built-in __CPROVER functions
    • they have no written implementation and need to be treated in an ad-hoc manner during instrumentation
  • __CPROVER library functions (have a written implementation)
  • POSIX library functions:
    • implemented in C on top of built-in functions (src/ansi-c/library/*.c)
    • can have loops, allocate/free memory, perform writes
    • write to global instrumentation variables used to track object sizes, lifetimes, memory leaks, etc.

In the current workflow, instrumentation is performed on user code before __CPROVER or POSIX libraries get added the goto model. This is done on purpose, to avoid dragging internal CBMC machinery in user contracts (library functions), and also to avoid instrumenting the body of trusted functions, and to avoid bringing loops that have no contracts in the instrumentation phase.

As a consequence:

  1. except for malloc and free, writes performed by built-in and library function are ignored in assigns clause checking (unsound)
  2. once library functions are loaded, new loops appear that need to be unrolled for analysis (prevents unbounded proofs)

As of today (Dec 1st 2021):

  • For point 1. we treat calls to malloc and free in an ad-hoc manner during instrumentation
  • For point 2. we unwind library function loops in a subsequent pass to make the analysis possible

Steps to improve:

  • Built-ins: no choice but ad-hoc handling
  • Read-only, loop-free library functions:
    • Add them to the project and instrument their bodies (sound but will cause a performance hit)
    • Skip the instrumentation of their bodies (better performance but puts them in the trusted code base)
  • Library functions with loops:
    • equip with full-fledged loop contracts or function contracts and treat them as regular functions
    • Handle them in an ad-hoc manner during instrumentation without instrumenting their bodies
    • Inline them before contract instrumentation and ignore writes to global __CPROVER instrumentation variables during assigns clause instrumentation

The key with these functions is to avoid dragging the __CPROVER internals into user contracts

@feliperodri feliperodri added bug aws Bugs or features of importance to AWS CBMC users Code Contracts Function and loop contracts labels Oct 5, 2022
remi-delmas-3000 pushed a commit to remi-delmas-3000/cbmc that referenced this issue Oct 31, 2022
- Fixes diffblue#7197
- Fixes diffblue#7198
- Fixes diffblue#7206

Overview
---

The new features brought by the system are:
- Allow to use C functions to specify reusable/parametric assigns
  clauses and frees clauses in the contracts language;
- Allow function calls in pre- and post-conditions and check the
  absence of side-effects;
- Checking contracts on recursive functions (by assuming preconditions
  and asserting post conditions on the top level call and asserting
  pre conditions and assuming post conditions on the recursive call)
- The frame checks are fully dynamic and does not require inlining the
  program or unwinding loops a priori;
- The method is compatible with deferred resolution of function
  pointers calls during symex;
- All CPROVER library functions are supported;

In addition:
- The checks are implemented as C library
  `src/ansi-c/library/cprover_contracts.c`, the instrumentation insert
  calls to the library in the program;
- Contract checking and contract replacement mostly share the same
  code;
- The whole system's user documentation and developer documentation is
  written in the doxygen format and found in
  `src/goto-instrument/contracts/doc/`.

Limitations:
- loop contracts are not yet ported to the new framework;
- deferred function pointer resolution is not yet implemented,
  user-specified function pointer restriction is still required;
- havocing during contract replacement still relies on snapshots known
  to affect performance;
- more work is performed by symbolic execution which can be felt on
  runtime;

Method
---

Let’s call `f_top`​ the top-level function against which the contract
is being checked.

We assume `f_top`​ is itself called from a `harness` function, which
may call other functions besides `f_top`​.

Let’s consider a function `f` that is either `f_top​` or one of the
functions called (directly or indirectly) by `f_top`​. The goal is to
verify that `f` writes only to memory locations that are part of the
frame of the call to `f_top` but allowed by the contract or that were
allocated after entering `f_top`​ and are hence not part of the frame
of the call. To do that, we add ghost code to the program to build
representations the following sets of memory locations as the program
runs:

(`contract_assigns`, `contract_frees`, `allocated`, `deallocated`)

where:

- `contract_assigns` is the set of memory locations specified in the
  *assigns clause* of the contract being checked
- `contract_frees` is the set of pointers specified in the *frees
  clause* of the contract being checked
- `allocated` is the set objects identifiers allocated on the stack or
  the heap since entering `f_top`​
- `deallocated` is the set of heap objects identifiers deallocated
  since entering `f_top`​ (allows verification that an object was freed
  as a post-condition of the contract)
The instrumentation transforms all user-defined goto functions and
library functions:

```c
ret_t f(<parameters>);
```

Into functions accepting an extra parameter that is a pointer to a
dynamic representation of the sets introduced above:

```c
ret_t f(<parameters>,  __CPROVER_contracts_write_set_ptr_t write_set);
```

Function bodies is instrumented so that when the write_set pointer is
NULL no checks are performed, when it is not NULL the following checks
are performed:
- Assignments LHS are checked for inclusion in `contract_assigns` or
  `allocated`,
- DECL objects are recorded in `allocated`
- Dynamically allocated objects are recorded in `allocated`
- DEAD objects or objects assigned to `__CPROVER_dead_object` are
  removed from `allocated`
- dynamic objects deallocated with `__CPROVER_deallocate` are checked
  for inclusion in `contract_frees` or `allocated`, and recorded in
  `deallocated` (so that if needed we can check in post conditions
  that the function indeed deallocated the expected memory).
- The current write set is propagated to all functions calls or
  function pointer calls.

Since the write set is dynamically updated, a same function called at
different call sites can see a different write set.

Write sets are initialised from contracts descriptions and injected in
function calls that are checked against contracts, or used to perform
inclusion checks against write sets obtained from callers for
functions that are replaced by a contract.

Implementation
---

The class `dfcc_utilst` provides methods to
manipulate function symbols in the symbol table:
- create fresh local symbols
- create parameter symbols
- create static symbols
- clone and rename function symbols
- add function parameters
- rename function parameters

The file `cprover_contracts.c` defines:
- C types representing write sets
- C functions to create/populate/release write sets
- C functions to do write sets inclusion checks
- C functions implementing built-in predicates
 `__CPROVER_is_freeable`, `__CPROVER_was_freed`,
 `__CPROVER_is_fresh`

The class `dfcc_libraryt` provides a programmatic interface to load
and use the library-defined types and functions and perform some
optimisations like inlining or unwinding loops in library functions.

The class `dfcc_contract_modet` is an enum representing the CHECK and
REPLACE modes for contracts.

The class `dfcc_is_fresht` Rewrites calls to `__CPROVER_is_fresh` into
calls to the library implementation `__CPROVER_contracts_is_fresh`.

The class `dfcc_is_freeablet` rewrites calls to
`__CPROVER_is_freeable` into calls to the library implementation
`__CPROVER_contracts_is_freeable` and rewrites calls to
`__CPROVER_was_freed` into calls to the library implementation
`__CPROVER_contracts_was_freed`.

The class `goto_program_cfg_infot` extends `cfg_infot` and computes
the set of program-locals from DECLs found in the instruction
sequence. Contrary to the existing `function_cfg_infot` this new class
does consider parameters symbols as locals.

The class `dfcc_instrumentt` instruments GOTO functions with write-set
checks.  It also instruments proof harness functions and sub-sequences
of GOTO instructions extracted from larger GOTO programs.

The class `dfcc_spec_functionst` rewrites assigns clause and frees
clause specification functions into functions that populate a given
write-set, and can be checked for side-effects against a second
write-set.

The class `dfcc_contract_functionst` allows to translate the assigns
clause (or frees clause) of a contract expressed in DSL style into
GOTO a function describing the same set of targets as the clause. It
also allows to generate a havoc function from an assigns clause, that
havocs the targets described by the clause in a type-directed way.

The class `dfcc_dsl_wrapper_programt` generates a sequence of GOTO
instruction from a contract expressed in the DSL syntax.  The
instructions either encode a checked call to the function under
verification (CHECK mode) or an abstraction derived from the contract
(REPLACE mode). In CHECK mode, it generates GOTO instructions that
assume preconditions, snapshots history variables, creates a write set
instance, call the function under verification asserts the
postconditions, and returns the call's return value. In REPLACE mode,
it generates GOTO instructions encoding the nondeterministic
abstraction defined by the contract: assert preconditions, create and
havoc assigns clause targets, create a nondet return value for the
call, nondeterministically free frees clause targets, assume
postconditions, return a value that satisfies the post
conditions. These instructions are meant to be added to the GOTO
function that serves as a wrapper for the function under verification
or being replaced by the contract.

The class `dfcc_contract_handlert` is an abstract interface for
generating GOTO instructions from contracts for CHECK or REPLACE
modes. It delegates the actual generation to
`dfcc_dsl_wrapper_programt`.

The class `dfcc_swap_and_wrapt`, given a `function_id`, a
`contract_id`, a contract `mode`, and a `dfcc_contract_handlert` for
the contract, swaps the body of the function to a function with a
mangled name and instruments the mangled function for dynamic frame
condition checking using `dfcc_instrument`. The body of the original
function is then replaced with instructions generated by the contract
handler for the required contract mode (CHECK or REPLACE). To handle
recursive functions in contract checking mode, the body of the wrapper
function is generated in such a way the first invocation results in
checking the contract against the mangled function and recursive
invocations of the wrapper behave like the contract abstraction
(induction hypothesis for the recursive call).

The `dfcct` class receives the proof harness function, the map of
functions and contracts to check and the map of functions to replace
with contracts.  It instruments the proof harness, performs
swap-and-wrap operations for each of the checked and replaced
functions. It then instruments all other functions for DFCC.

Regression Tests
---

Regression tests are found in `regression/contracts-dfcc`. These
tests are ported from `regression/contracts` and only contain
tests for function contracts.
remi-delmas-3000 pushed a commit to remi-delmas-3000/cbmc that referenced this issue Nov 1, 2022
- Fixes diffblue#7197
- Fixes diffblue#7198
- Fixes diffblue#7206

Overview
---

The new features brought by the system are:
- Allow to use C functions to specify reusable/parametric assigns
  clauses and frees clauses in the contracts language;
- Allow function calls in pre- and post-conditions and check the
  absence of side-effects;
- Checking contracts on recursive functions (by assuming preconditions
  and asserting post conditions on the top level call and asserting
  pre conditions and assuming post conditions on the recursive call)
- The frame checks are fully dynamic and does not require inlining the
  program or unwinding loops a priori;
- The method is compatible with deferred resolution of function
  pointers calls during symex;
- All CPROVER library functions are supported;

In addition:
- The checks are implemented as C library
  `src/ansi-c/library/cprover_contracts.c`, the instrumentation insert
  calls to the library in the program;
- Contract checking and contract replacement mostly share the same
  code;
- The whole system's user documentation and developer documentation is
  written in the doxygen format and found in
  `src/goto-instrument/contracts/doc/`.

Limitations:
- loop contracts are not yet ported to the new framework;
- deferred function pointer resolution is not yet implemented,
  user-specified function pointer restriction is still required;
- havocing during contract replacement still relies on snapshots known
  to affect performance;
- more work is performed by symbolic execution which can be felt on
  runtime;

Method
---

Let’s call `f_top`​ the top-level function against which the contract
is being checked.

We assume `f_top`​ is itself called from a `harness` function, which
may call other functions besides `f_top`​.

Let’s consider a function `f` that is either `f_top​` or one of the
functions called (directly or indirectly) by `f_top`​. The goal is to
verify that `f` writes only to memory locations that are part of the
frame of the call to `f_top` but allowed by the contract or that were
allocated after entering `f_top`​ and are hence not part of the frame
of the call. To do that, we add ghost code to the program to build
representations the following sets of memory locations as the program
runs:

(`contract_assigns`, `contract_frees`, `allocated`, `deallocated`)

where:

- `contract_assigns` is the set of memory locations specified in the
  *assigns clause* of the contract being checked
- `contract_frees` is the set of pointers specified in the *frees
  clause* of the contract being checked
- `allocated` is the set objects identifiers allocated on the stack or
  the heap since entering `f_top`​
- `deallocated` is the set of heap objects identifiers deallocated
  since entering `f_top`​ (allows verification that an object was freed
  as a post-condition of the contract)
The instrumentation transforms all user-defined goto functions and
library functions:

```c
ret_t f(<parameters>);
```

Into functions accepting an extra parameter that is a pointer to a
dynamic representation of the sets introduced above:

```c
ret_t f(<parameters>,  __CPROVER_contracts_write_set_ptr_t write_set);
```

Function bodies is instrumented so that when the write_set pointer is
NULL no checks are performed, when it is not NULL the following checks
are performed:
- Assignments LHS are checked for inclusion in `contract_assigns` or
  `allocated`,
- DECL objects are recorded in `allocated`
- Dynamically allocated objects are recorded in `allocated`
- DEAD objects or objects assigned to `__CPROVER_dead_object` are
  removed from `allocated`
- dynamic objects deallocated with `__CPROVER_deallocate` are checked
  for inclusion in `contract_frees` or `allocated`, and recorded in
  `deallocated` (so that if needed we can check in post conditions
  that the function indeed deallocated the expected memory).
- The current write set is propagated to all functions calls or
  function pointer calls.

Since the write set is dynamically updated, a same function called at
different call sites can see a different write set.

Write sets are initialised from contracts descriptions and injected in
function calls that are checked against contracts, or used to perform
inclusion checks against write sets obtained from callers for
functions that are replaced by a contract.

Implementation
---

The class `dfcc_utilst` provides methods to
manipulate function symbols in the symbol table:
- create fresh local symbols
- create parameter symbols
- create static symbols
- clone and rename function symbols
- add function parameters
- rename function parameters

The file `cprover_contracts.c` defines:
- C types representing write sets
- C functions to create/populate/release write sets
- C functions to do write sets inclusion checks
- C functions implementing built-in predicates
 `__CPROVER_is_freeable`, `__CPROVER_was_freed`,
 `__CPROVER_is_fresh`

The class `dfcc_libraryt` provides a programmatic interface to load
and use the library-defined types and functions and perform some
optimisations like inlining or unwinding loops in library functions.

The class `dfcc_contract_modet` is an enum representing the CHECK and
REPLACE modes for contracts.

The class `dfcc_is_fresht` Rewrites calls to `__CPROVER_is_fresh` into
calls to the library implementation `__CPROVER_contracts_is_fresh`.

The class `dfcc_is_freeablet` rewrites calls to
`__CPROVER_is_freeable` into calls to the library implementation
`__CPROVER_contracts_is_freeable` and rewrites calls to
`__CPROVER_was_freed` into calls to the library implementation
`__CPROVER_contracts_was_freed`.

The class `goto_program_cfg_infot` extends `cfg_infot` and computes
the set of program-locals from DECLs found in the instruction
sequence. Contrary to the existing `function_cfg_infot` this new class
does consider parameters symbols as locals.

The class `dfcc_instrumentt` instruments GOTO functions with write-set
checks.  It also instruments proof harness functions and sub-sequences
of GOTO instructions extracted from larger GOTO programs.

The class `dfcc_spec_functionst` rewrites assigns clause and frees
clause specification functions into functions that populate a given
write-set, and can be checked for side-effects against a second
write-set.

The class `dfcc_contract_functionst` allows to translate the assigns
clause (or frees clause) of a contract expressed in DSL style into
GOTO a function describing the same set of targets as the clause. It
also allows to generate a havoc function from an assigns clause, that
havocs the targets described by the clause in a type-directed way.

The class `dfcc_dsl_wrapper_programt` generates a sequence of GOTO
instruction from a contract expressed in the DSL syntax.  The
instructions either encode a checked call to the function under
verification (CHECK mode) or an abstraction derived from the contract
(REPLACE mode). In CHECK mode, it generates GOTO instructions that
assume preconditions, snapshots history variables, creates a write set
instance, call the function under verification asserts the
postconditions, and returns the call's return value. In REPLACE mode,
it generates GOTO instructions encoding the nondeterministic
abstraction defined by the contract: assert preconditions, create and
havoc assigns clause targets, create a nondet return value for the
call, nondeterministically free frees clause targets, assume
postconditions, return a value that satisfies the post
conditions. These instructions are meant to be added to the GOTO
function that serves as a wrapper for the function under verification
or being replaced by the contract.

The class `dfcc_contract_handlert` is an abstract interface for
generating GOTO instructions from contracts for CHECK or REPLACE
modes. It delegates the actual generation to
`dfcc_dsl_wrapper_programt`.

The class `dfcc_swap_and_wrapt`, given a `function_id`, a
`contract_id`, a contract `mode`, and a `dfcc_contract_handlert` for
the contract, swaps the body of the function to a function with a
mangled name and instruments the mangled function for dynamic frame
condition checking using `dfcc_instrument`. The body of the original
function is then replaced with instructions generated by the contract
handler for the required contract mode (CHECK or REPLACE). To handle
recursive functions in contract checking mode, the body of the wrapper
function is generated in such a way the first invocation results in
checking the contract against the mangled function and recursive
invocations of the wrapper behave like the contract abstraction
(induction hypothesis for the recursive call).

The `dfcct` class receives the proof harness function, the map of
functions and contracts to check and the map of functions to replace
with contracts.  It instruments the proof harness, performs
swap-and-wrap operations for each of the checked and replaced
functions. It then instruments all other functions for DFCC.

Regression Tests
---

Regression tests are found in `regression/contracts-dfcc`. These
tests are ported from `regression/contracts` and only contain
tests for function contracts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users bug Code Contracts Function and loop contracts
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants