Skip to content

CONTRACTS: Dynamic frame condition checking #7242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

remi-delmas-3000
Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 commented Oct 13, 2022

Replaces #6887

Fixes #7197
Fixes #7198
Fixes #7206

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Implement the functionality described below.

Tests adapted from the existing contracts implementation by @nwetzler

Motivation

This new function contracts implementation uses a dynamic method for checking assigns and frees clauses:

  • it allows parametric assigns clauses and frees clauses in the contracts language;
  • it allows function calls in pre- and post-conditions and checks the absence of side-effects;
  • the checking method is dynamic and does not require inlining the program or unwinding loops a priori;
  • the checking method is compatible with deferred resolution of function pointers calls during symex;
  • the checks are implemented as C library (src/ansi-c/library/cprover_contracts.c), the instrumentation insert calls to the library in the program;
  • code duplication for contract checking and contract replacement has been reduced;
  • all CPROVER library functions are supported;
  • checking contracts on recursive functions is supported (by assuming preconditions and asserting post conditions on the top level call and asserting pre conditions and assuming post conditions on the recursive call)
  • the user and developer documentation for the contract system now lives in doxygen

Limitations

  • loop contracts are not yet ported to the new framework;
  • deferred function pointer resolution is not yet implemented, user-specified function pointer restriction is still required;
  • havocing during contract replacement still relies on pointer snapshots known to affect performance;
  • more work is performed by symbolic execution which can be felt on runtime;
  • source locations for the instrumented code are still very partial and messy;

Dynamic frame condition checking (DFCC) for function contracts

Let’s call f_top​ the top-level function against which the contract is being checked.

We assume f_top​ is itself called from a harness function, which may call other functions besides f_top​.

Let’s consider a function f that is either f_top​ or one of the functions called (directly or indirectly) by f_top​. The goal is to verify that f writes only to memory locations that are part of the frame of the call to f_top but allowed by the contract or that were allocated after entering f_top​ and are hence not part of the frame of the call. To do that, we add ghost code to the program to build representations the following sets of memory locations as the program runs:

  • (contract_assigns, contract_frees, allocated, deallocated)
    where:
  • contract_assigns is the set of memory locations specified in the assigns clause of the contract being checked
  • contract_frees is the set of pointers specified in the frees clause of the contract being checked
  • allocated is the set objects identifiers allocated on the stack or the heap since entering f_top
  • deallocated is the set of heap objects identifiers deallocated since entering f_top​ (allows verification that an object was freed as a post-condition of the contract)

The instrumentation transforms all user-defined goto functions and library functions:

ret_t f(<parameters>);

Into functions accepting an extra parameter that is a pointer to a dynamic representation of the sets introduced above:

ret_t f(<parameters>,  __CPROVER_contracts_write_set_ptr_t write_set);

Function bodies is instrumented so that

  • when the write_set pointer is NULL no checks are performed,
  • when it is not NULL the following checks are performed
    • Assignments LHS are checked for inclusion in contract_assigns or allocated,
    • DECL objects are recorded in allocated
    • Dynamically allocated objects are recorded in allocated
    • DEAD objects or objects assigned to __CPROVER_dead_object are removed from allocated
    • dynamic objects deallocated with __CPROVER_deallocate are checked for inclusion in contract_frees or allocated, and recorded in deallocated (so that if needed we can check in post conditions that the function indeed deallocated the expected memory).
    • The current write set is propagated to all functions calls or function pointer calls.

Since the write set is dynamically updated, a same function called at different call sites can see a different write set.

Write sets are initialised from contracts descriptions and injected in function calls that are checked against contracts, or used to perform inclusion checks against write sets obtained from callers for functions that are replaced by a contract.


@codecov
Copy link

codecov bot commented Oct 14, 2022

Codecov Report

Base: 78.03% // Head: 78.22% // Increases project coverage by +0.18% 🎉

Coverage data is based on head (1a0022c) compared to base (21e3260).
Patch has no changes to coverable lines.

❗ Current head 1a0022c differs from pull request most recent head e22a788. Consider uploading reports for the commit e22a788 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7242      +/-   ##
===========================================
+ Coverage    78.03%   78.22%   +0.18%     
===========================================
  Files         1624     1642      +18     
  Lines       187405   189688    +2283     
===========================================
+ Hits        146249   148388    +2139     
- Misses       41156    41300     +144     
Impacted Files Coverage Δ
src/goto-instrument/contracts/contracts.cpp 95.03% <0.00%> (-0.04%) ⬇️
src/ansi-c/c_expr.h 100.00% <0.00%> (ø)
src/analyses/dirty.h 100.00% <0.00%> (ø)
src/goto-symex/goto_symex.cpp 98.64% <0.00%> (ø)
src/solvers/flattening/boolbv_width.cpp 73.80% <0.00%> (ø)
src/goto-instrument/contracts/contracts.h 100.00% <0.00%> (ø)
...rc/goto-instrument/goto_instrument_parse_options.h 100.00% <0.00%> (ø)
...ntracts/dynamic-frames/dfcc_contract_functions.cpp 83.79% <0.00%> (ø)
...instrument/contracts/dynamic-frames/dfcc_utils.cpp 95.70% <0.00%> (ø)
...ent/contracts/dynamic-frames/dfcc_spec_functions.h 100.00% <0.00%> (ø)
... and 31 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tautschnig
Copy link
Collaborator

Why are we duplicating all (?) the existing contracts tests? I am wondering about:

  1. How are we going to address the maintenance burden caused by duplicated code? If we absolutely need new test specifications, then we could just refer to the source files in the other directory.
  2. Apart from the different command-line option (which could be addressed by a test profile/makefile rule), if the output is different in some way (which would justify the need for different test specifications) then what's the user story here?

@tautschnig
Copy link
Collaborator

Why are we duplicating all (?) the existing contracts tests? I am wondering about:

1. How are we going to address the maintenance burden caused by duplicated code? If we absolutely need new test specifications, then we could just refer to the source files in the other directory.

2. Apart from the different command-line option (which could be addressed by a test profile/makefile rule), if the output is different in some way (which would justify the need for different test specifications) then what's the user story here?

Follow-up comment as discussed out-of-band: we might merge this PR as-is, and then clean up afterwards, with the help of a script to be written.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a first set of comments, I'll review the remaining parts in my next available time slot.

// collect dirty locals
goto_functiont goto_function;
goto_function.body.copy_from(goto_program);

dirtyt is_dirty(goto_function);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: we should fix the API of dirtyt: I have no idea why we chose goto_functiont as the most appropriate granularity, agoto_programt should do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #7244 .

@remi-delmas-3000 remi-delmas-3000 force-pushed the rebase-dfcc-fix-regression branch 3 times, most recently from cf8bc90 to 9afaa18 Compare October 14, 2022 21:07
/// The public method \ref add_to_dest calls the private methods to populate the
/// sections, and then adds the contents of these sections in order to the
/// given destination program.
class dfcc_dsl_wrapper_programt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will continue my next review from here.

Date: August 2022

\*******************************************************************/
#include "dfcc_is_freeable.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next review will continue from here.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with a first round of reviewing.

@tautschnig tautschnig removed their assignment Oct 20, 2022
@nwetzler nwetzler self-requested a review October 21, 2022 14:44
@remi-delmas-3000 remi-delmas-3000 force-pushed the rebase-dfcc-fix-regression branch from 9afaa18 to 44a112f Compare October 21, 2022 16:23
@remi-delmas-3000 remi-delmas-3000 force-pushed the rebase-dfcc-fix-regression branch 8 times, most recently from c877eb5 to 0b73b79 Compare October 25, 2022 21:05
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some new comments, but also I noticed that a number of my earlier comments apparently still need addressing/haven't had a response. Perhaps GitHub was just hiding them?

Comment on lines +112 to +118
#pragma CPROVER check push
#pragma CPROVER check disable "pointer"
#pragma CPROVER check disable "pointer-primitive"
#pragma CPROVER check disable "unsigned-overflow"
#pragma CPROVER check disable "signed-overflow"
#pragma CPROVER check disable "undefined-shift"
#pragma CPROVER check disable "conversion"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is disabling these checks the right thing to do?

Copy link
Collaborator Author

@remi-delmas-3000 remi-delmas-3000 Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other assertions in the body entail these checks.

Comment on lines 122 to 134
__CPROVER_contracts_car_t car = {0};
car.is_writable = ptr != 0;
car.size = size;
car.lb = ptr;
__CPROVER_assert(
size < __CPROVER_max_malloc_size,
"CAR size is less than __CPROVER_max_malloc_size");
__CPROVER_ssize_t offset = __CPROVER_POINTER_OFFSET(ptr);
__CPROVER_assert(
!(offset > 0) |
((__CPROVER_size_t)offset + size < __CPROVER_max_malloc_size),
"no offset bits overflow on CAR upper bound computation");
car.ub = (void *)((char *)ptr + size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about doing the initialisation in one step (_CPROVER_contracts_car_t car = {.is_writeable = ptr != 0, .size = size, .lb = ptr, .ub = (char *)ptr + size};)?

@remi-delmas-3000 remi-delmas-3000 force-pushed the rebase-dfcc-fix-regression branch from d5046b9 to 7c0fbd7 Compare October 27, 2022 15:11
Copy link

@nwetzler nwetzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I love the new developer documentation and user documentation. I now have a much clearer picture of how this replaces the older contract instrumentation, and I feel like I might have a chance of supporting it in the future. The code looks really clean and I'm excited to use it.

Review comments are mostly focused on typos and documentation stuff. I tried to make suggestions where possible.

Cheers.

@@ -1003,6 +1003,12 @@ replace calls to \fIfun\fR with \fIfun\fR's contract
.TP
\fB\-\-enforce\-contract\fR \fIfun\fR
wrap \fIfun\fR with an assertion of its contract
.TP
\fB\-\-enforce\-contract\-rec\fR \fIfun\fR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need --enforce-contract-rec in this PR? If so, why? I might have expected these to be two separate PRs.

Copy link
Collaborator Author

@remi-delmas-3000 remi-delmas-3000 Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doable but pulling out support for recursive functions from this PR would take me quite some time again

@@ -0,0 +1,37 @@
# Program Transformation Overview {#contracts-dev-spec-transfo-params}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick, but is there a reason we can't have a 2 character longer name and get transform-params instead of transfo-params?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

They can have been generated from an assigns clause or written by the user.

To convert them to havoc functions, the first step is to inline their body to
remove all function calls to check the result to be loop free.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@remi-delmas-3000 remi-delmas-3000 force-pushed the rebase-dfcc-fix-regression branch from c5c55c2 to 445e377 Compare October 28, 2022 14:49
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as the remaining discussion points are about aesthetics.

@remi-delmas-3000 remi-delmas-3000 force-pushed the rebase-dfcc-fix-regression branch 2 times, most recently from 1a0022c to 13bffcc Compare October 31, 2022 14:48
- Fixes diffblue#7197
- Fixes diffblue#7198
- Fixes diffblue#7206

Overview
---

The new features brought by the system are:
- Allow to use C functions to specify reusable/parametric assigns
  clauses and frees clauses in the contracts language;
- Allow function calls in pre- and post-conditions and check the
  absence of side-effects;
- Checking contracts on recursive functions (by assuming preconditions
  and asserting post conditions on the top level call and asserting
  pre conditions and assuming post conditions on the recursive call)
- The frame checks are fully dynamic and does not require inlining the
  program or unwinding loops a priori;
- The method is compatible with deferred resolution of function
  pointers calls during symex;
- All CPROVER library functions are supported;

In addition:
- The checks are implemented as C library
  `src/ansi-c/library/cprover_contracts.c`, the instrumentation insert
  calls to the library in the program;
- Contract checking and contract replacement mostly share the same
  code;
- The whole system's user documentation and developer documentation is
  written in the doxygen format and found in
  `src/goto-instrument/contracts/doc/`.

Limitations:
- loop contracts are not yet ported to the new framework;
- deferred function pointer resolution is not yet implemented,
  user-specified function pointer restriction is still required;
- havocing during contract replacement still relies on snapshots known
  to affect performance;
- more work is performed by symbolic execution which can be felt on
  runtime;

Method
---

Let’s call `f_top`​ the top-level function against which the contract
is being checked.

We assume `f_top`​ is itself called from a `harness` function, which
may call other functions besides `f_top`​.

Let’s consider a function `f` that is either `f_top​` or one of the
functions called (directly or indirectly) by `f_top`​. The goal is to
verify that `f` writes only to memory locations that are part of the
frame of the call to `f_top` but allowed by the contract or that were
allocated after entering `f_top`​ and are hence not part of the frame
of the call. To do that, we add ghost code to the program to build
representations the following sets of memory locations as the program
runs:

(`contract_assigns`, `contract_frees`, `allocated`, `deallocated`)

where:

- `contract_assigns` is the set of memory locations specified in the
  *assigns clause* of the contract being checked
- `contract_frees` is the set of pointers specified in the *frees
  clause* of the contract being checked
- `allocated` is the set objects identifiers allocated on the stack or
  the heap since entering `f_top`​
- `deallocated` is the set of heap objects identifiers deallocated
  since entering `f_top`​ (allows verification that an object was freed
  as a post-condition of the contract)
The instrumentation transforms all user-defined goto functions and
library functions:

```c
ret_t f(<parameters>);
```

Into functions accepting an extra parameter that is a pointer to a
dynamic representation of the sets introduced above:

```c
ret_t f(<parameters>,  __CPROVER_contracts_write_set_ptr_t write_set);
```

Function bodies is instrumented so that when the write_set pointer is
NULL no checks are performed, when it is not NULL the following checks
are performed:
- Assignments LHS are checked for inclusion in `contract_assigns` or
  `allocated`,
- DECL objects are recorded in `allocated`
- Dynamically allocated objects are recorded in `allocated`
- DEAD objects or objects assigned to `__CPROVER_dead_object` are
  removed from `allocated`
- dynamic objects deallocated with `__CPROVER_deallocate` are checked
  for inclusion in `contract_frees` or `allocated`, and recorded in
  `deallocated` (so that if needed we can check in post conditions
  that the function indeed deallocated the expected memory).
- The current write set is propagated to all functions calls or
  function pointer calls.

Since the write set is dynamically updated, a same function called at
different call sites can see a different write set.

Write sets are initialised from contracts descriptions and injected in
function calls that are checked against contracts, or used to perform
inclusion checks against write sets obtained from callers for
functions that are replaced by a contract.

Implementation
---

The class `dfcc_utilst` provides methods to
manipulate function symbols in the symbol table:
- create fresh local symbols
- create parameter symbols
- create static symbols
- clone and rename function symbols
- add function parameters
- rename function parameters

The file `cprover_contracts.c` defines:
- C types representing write sets
- C functions to create/populate/release write sets
- C functions to do write sets inclusion checks
- C functions implementing built-in predicates
 `__CPROVER_is_freeable`, `__CPROVER_was_freed`,
 `__CPROVER_is_fresh`

The class `dfcc_libraryt` provides a programmatic interface to load
and use the library-defined types and functions and perform some
optimisations like inlining or unwinding loops in library functions.

The class `dfcc_contract_modet` is an enum representing the CHECK and
REPLACE modes for contracts.

The class `dfcc_is_fresht` Rewrites calls to `__CPROVER_is_fresh` into
calls to the library implementation `__CPROVER_contracts_is_fresh`.

The class `dfcc_is_freeablet` rewrites calls to
`__CPROVER_is_freeable` into calls to the library implementation
`__CPROVER_contracts_is_freeable` and rewrites calls to
`__CPROVER_was_freed` into calls to the library implementation
`__CPROVER_contracts_was_freed`.

The class `goto_program_cfg_infot` extends `cfg_infot` and computes
the set of program-locals from DECLs found in the instruction
sequence. Contrary to the existing `function_cfg_infot` this new class
does consider parameters symbols as locals.

The class `dfcc_instrumentt` instruments GOTO functions with write-set
checks.  It also instruments proof harness functions and sub-sequences
of GOTO instructions extracted from larger GOTO programs.

The class `dfcc_spec_functionst` rewrites assigns clause and frees
clause specification functions into functions that populate a given
write-set, and can be checked for side-effects against a second
write-set.

The class `dfcc_contract_functionst` allows to translate the assigns
clause (or frees clause) of a contract expressed in DSL style into
GOTO a function describing the same set of targets as the clause. It
also allows to generate a havoc function from an assigns clause, that
havocs the targets described by the clause in a type-directed way.

The class `dfcc_dsl_wrapper_programt` generates a sequence of GOTO
instruction from a contract expressed in the DSL syntax.  The
instructions either encode a checked call to the function under
verification (CHECK mode) or an abstraction derived from the contract
(REPLACE mode). In CHECK mode, it generates GOTO instructions that
assume preconditions, snapshots history variables, creates a write set
instance, call the function under verification asserts the
postconditions, and returns the call's return value. In REPLACE mode,
it generates GOTO instructions encoding the nondeterministic
abstraction defined by the contract: assert preconditions, create and
havoc assigns clause targets, create a nondet return value for the
call, nondeterministically free frees clause targets, assume
postconditions, return a value that satisfies the post
conditions. These instructions are meant to be added to the GOTO
function that serves as a wrapper for the function under verification
or being replaced by the contract.

The class `dfcc_contract_handlert` is an abstract interface for
generating GOTO instructions from contracts for CHECK or REPLACE
modes. It delegates the actual generation to
`dfcc_dsl_wrapper_programt`.

The class `dfcc_swap_and_wrapt`, given a `function_id`, a
`contract_id`, a contract `mode`, and a `dfcc_contract_handlert` for
the contract, swaps the body of the function to a function with a
mangled name and instruments the mangled function for dynamic frame
condition checking using `dfcc_instrument`. The body of the original
function is then replaced with instructions generated by the contract
handler for the required contract mode (CHECK or REPLACE). To handle
recursive functions in contract checking mode, the body of the wrapper
function is generated in such a way the first invocation results in
checking the contract against the mangled function and recursive
invocations of the wrapper behave like the contract abstraction
(induction hypothesis for the recursive call).

The `dfcct` class receives the proof harness function, the map of
functions and contracts to check and the map of functions to replace
with contracts.  It instruments the proof harness, performs
swap-and-wrap operations for each of the checked and replaced
functions. It then instruments all other functions for DFCC.

Regression Tests
---

Regression tests are found in `regression/contracts-dfcc`. These
tests are ported from `regression/contracts` and only contain
tests for function contracts.
@remi-delmas-3000 remi-delmas-3000 force-pushed the rebase-dfcc-fix-regression branch from 13bffcc to e22a788 Compare November 1, 2022 15:05
Copy link
Collaborator

@feliperodri feliperodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any blockers, but it'd be nice to maintain the previous git history (list of commits).

@remi-delmas-3000 remi-delmas-3000 merged commit 2cd4154 into diffblue:develop Nov 2, 2022
@diffblue diffblue deleted a comment from Roheel7 Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high Code Contracts Function and loop contracts enhancement new feature
Projects
None yet
6 participants