Skip to content

CONTRACTS: class dfcc_cfg_infot: loops and identifier scoping for DFCC #7671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

remi-delmas-3000
Copy link
Collaborator

This class extracts loop information from a goto program and:

  • tags goto instructions with loop markers to facilitate loop instrumentation in the presence of nested loops
  • presents loop contract clauses it in a structured way for loop contract instrumentation
  • computes which function-local and loop-local variables must be tracked explicitly in function and loop write sets

This PR is extracted from a larger PR #7541 and can only be tested once all features are merged.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@remi-delmas-3000 remi-delmas-3000 self-assigned this Apr 18, 2023
@remi-delmas-3000 remi-delmas-3000 added aws Bugs or features of importance to AWS CBMC users aws-high Code Contracts Function and loop contracts labels Apr 18, 2023
@remi-delmas-3000 remi-delmas-3000 force-pushed the contracts-dfcc-cfg-info branch from ba28654 to 4e65b80 Compare April 18, 2023 16:36
@codecov
Copy link

codecov bot commented Apr 18, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.01 ⚠️

Comparison is base (0e78362) 78.51% compared to head (569804b) 78.51%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7671      +/-   ##
===========================================
- Coverage    78.51%   78.51%   -0.01%     
===========================================
  Files         1674     1674              
  Lines       191946   191940       -6     
===========================================
- Hits        150715   150709       -6     
  Misses       41231    41231              
Impacted Files Coverage Δ
...t/contracts/dynamic-frames/dfcc_spec_functions.cpp 91.51% <ø> (-0.30%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@feliperodri feliperodri removed their assignment Apr 20, 2023
@remi-delmas-3000 remi-delmas-3000 force-pushed the contracts-dfcc-cfg-info branch 2 times, most recently from 4d8ba1c to 75b519a Compare April 20, 2023 19:36
@qinheping qinheping force-pushed the contracts-dfcc-cfg-info branch from 75b519a to 2071d5f Compare April 20, 2023 22:11
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this first round I have just taken a look at the interface, not the implementation. I believe this needs to be changed.

Comment on lines 208 to 213
const irep_idt &function_id,
goto_functiont &goto_function,
const exprt &top_level_write_set,
const dfcc_loop_contract_modet loop_contract_mode,
goto_modelt &goto_model,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are taking a function_id, a goto_function, and a goto_model. How do these relate? I'd hope that the goto_function is the value where function_id is the key in goto_model's function map. If so, please omit the goto_function parameter and compute goto_function. Else you are just asking for inconsistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the parameter goto_function should be removed. I will fix it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are places in contract instrumentation where we instrument goto programs snippets that will eventually land in the function named function_id but are not there yet. These snippets are temporarily embedded in dummy goto functions because natural loop or may alias functions only accept goto_functions as inputs.
These goto functions are not found in the function table of the goto model. Fresh symbols generated when instrumenting these dummy goto functions have to use function_id as base name so it needs to be passed around, so does the goto model because it carries the symbol table which is needed when creating fresh symbols.

Making the suggested change will force us to rewrite large parts of the instrumentation code in dfcc_instrumentt and dfcc_wrapper_functiont, or to temporarily insert/remove dummy goto functions in the goto model just to be able to pass the goto model and the key instead of the goto model, the key and the goto function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, @remi-delmas-3000 is correct. And removing goto_function will break the tests. I will undo the fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix this. The DFCC code is becoming inscrutable because of undocumented hacks of this kind.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you look at dfcc_instrumentt in #7541 you'll see the different entry points for instrumentation, hopefully that makes things clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed goto_model and usesymbol_table instead to make it less confusing.

@tautschnig tautschnig removed their assignment Apr 21, 2023
@qinheping qinheping force-pushed the contracts-dfcc-cfg-info branch 3 times, most recently from bfc6f6c to 39a0a11 Compare April 21, 2023 21:12
{
// This is a complex expression. Compute the set of root objects expressions
// it can assign.
auto root_objects = dfcc_root_objects(lhs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about you always do this? For a symbol expression, dfcc_root_objects will immediately return just that symbol. I think it would reduce the code complexity here, and also make behaviour more consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the else branch, so it is not a symbol expression ... What do you mean by "more consistent" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you could do auto root_objects = dfcc_root_objects(lhs); at the beginning of this function and then the if branch that currently exists at the top would be eliminated for root_objects would equally cover symbol expressions (you would have exactly one entry in root_idents below). I would then expect that all the use of tracked and local be consistent, which is not obvious from the code in the if case above: it doesn't even look at tracked, and I don't know why this is ok to do. Conversely, this else branch never does dfcc_is_cprover_symbol, and again I don't know why this is ok (not) to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified this a bit

@qinheping qinheping force-pushed the contracts-dfcc-cfg-info branch from 39a0a11 to 428d542 Compare April 25, 2023 07:02
@qinheping qinheping force-pushed the contracts-dfcc-cfg-info branch from 428d542 to 2ae2d4a Compare April 26, 2023 04:06
@qinheping qinheping requested a review from tautschnig April 26, 2023 15:30
Comment on lines 429 to 418
std::unordered_set<irep_idt> loop_tracked;
for(const auto &ident : loop_local)
{
if(dirty(ident))
{
loop_tracked.insert(ident);
}
else
{
// Check if this ident is touched by one of the inner loops
for(const auto inner_loop_id :
loop_nesting_graph.get_predecessors(loop_id))
{
if(is_assigned(ident, loop_info_map.at(inner_loop_id).assigns))
loop_tracked.insert(ident);
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code is now repeated twice (with the necessary local modifications) and should be factored out into a method or function.

Copy link
Collaborator Author

@remi-delmas-3000 remi-delmas-3000 Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factored

Comment on lines 449 to 516
exprt::operandst invariant_clauses = get_invariants(loop.latch);
exprt::operandst decreases_clauses = get_decreases(loop.latch);
exprt::operandst assigns_clauses = get_assigns(loop.latch);
std::set<exprt> assigns;
exprt invariant_expr = nil_exprt();

// Generate defaults for all clauses if some clause is defined
if(
!invariant_clauses.empty() || !decreases_clauses.empty() ||
!assigns_clauses.empty())
{
if(invariant_clauses.empty())
{
// use a default invariant if none given.
invariant_expr = true_exprt{};
// assigns clause is missing; we will try to automatic inference
log.warning() << "No invariant provided for loop " << function_id << "."
<< loop.latch->loop_number << " at "
<< loop.head->source_location()
<< ". Using 'true' as a sound default invariant. Please "
"provide an invariant the default is too weak."
<< messaget::eom;
}
else
{
// conjoin all invariant clauses
invariant_expr = conjunction(invariant_clauses);
}

// unpack assigns clause targets
if(!assigns_clauses.empty())
{
for(auto &operand : assigns_clauses)
{
assigns.insert(operand);
}
}
else
{
// infer assigns clause targets if none given
auto inferred = dfcc_infer_loop_assigns(
local_may_alias, loop.instructions, loop.head->source_location(), ns);
log.warning() << "No assigns clause provided for loop " << function_id
<< "." << loop.latch->loop_number << " at "
<< loop.head->source_location() << ". The inferred set {";
bool first = true;
for(const auto &expr : inferred)
{
if(!first)
{
log.status() << ", ";
}
first = false;
log.status() << format(expr);
}
log.status() << "} might be incomplete or imprecise, please provide an "
"assigns clause if the analysis fails."
<< messaget::eom;
assigns.swap(inferred);
}

if(decreases_clauses.empty())
{
log.warning() << "No decrease clause provided for loop " << function_id
<< "." << loop.latch->loop_number << " at "
<< loop.head->source_location()
<< ". Termination will not be checked." << messaget::eom;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves factoring out into a method of its own. I think it would just need to take loop as argument, and could return assigns/invariant_expr/decreases_clauses via a struct that has those three members.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factored

Comment on lines 596 to 552
{
const auto head = check_inner_loops_have_contracts(loop_nesting_graph);
if(head.has_value())
{
throw invalid_source_file_exceptiont(
"Found loop without contract nested in a loop with a "
"contract.\nPlease "
"provide a contract or unwind this loop before applying loop "
"contracts.",
head.value()->source_location());
}
}

auto topsorted = loop_nesting_graph.topsort();
for(const auto idx : topsorted)
{
topsorted_loops.push_back(idx);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of this could be moved out of the if body. Else its hard to understand why just tag_loop_instructions has to live outside that body. (Alternatively, if my earlier suggestion of not tagging top-level instructions is implemented, you could safely move tag_loop_instructions inside this if.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea we had here is that unless they are actually used, we do not impose the check_inner_loops_have_contracts constraints.

Not tagging top-level instructions would have far reaching consequences outside of this PR. The tag has to exist even it trivial, as it represents a key in a total mapping from instructions to write sets, and is used by everything else to lookup the write set variables to use for instrumentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept logic as-is but added comment

@remi-delmas-3000 remi-delmas-3000 force-pushed the contracts-dfcc-cfg-info branch from 2ae2d4a to eaf9f6d Compare May 1, 2023 23:56
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the refactoring! I still have some questions/comments, but they seem rather minor now.

Comment on lines +230 to +243
// tag remaining instructions as top level
for(auto target = goto_program.instructions.begin();
target != goto_program.instructions.end();
target++)
{
// Skip instructions that are already tagged (belong to loops).
auto loop_id_opt = dfcc_get_loop_id(target);
if(loop_id_opt.has_value())
{
continue;
}
dfcc_set_loop_id(target, top_level_id);
dfcc_set_loop_top_level(target);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep wondering: why do we actually need to tag these? Can't we just leave them untagged?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the instrumentation passes that follow expect to find a tag on each instruction, and use it as a key to query the write set variable from the dfcc_cfg_info class.

  1. Tagging all instructions amounts to using size_t as key type and a special value to represent the top-level scope.
  2. Not tagging some instructions amounts to using optionalt<size_t> as key type, with none the for top level scope.

I think 1) allows for better sanity checks.

With 1), if you discover an instruction without tag then it is an error for sure. With 2), it could be because it is a top level instruction, or it could be a loop instruction that was skipped because of a bug somewhere.

{
messaget log(message_handler);

const auto loop = loop_nesting_graph[loop_id];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick: I believe this will create a copy - don't you want a reference instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


// initialise defaults
struct contract_clausest result;
result.invariant_expr = nil_exprt();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good candidate for a (default) constructor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 402 to 414
bool first = true;
for(const auto &expr : inferred)
{
if(!first)
{
log.status() << ", ";
}
first = false;
log.status() << format(expr);
}
log.status() << "} might be incomplete or imprecise, please provide an "
"assigns clause if the analysis fails."
<< messaget::eom;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use log.conditional_output(log.status(), [&](messaget::mstreamt &mstream) { ... < the above code > } to make clear that all of this is just about logging. (I usually recommend this for debug output to improve performance; here it's more about readability.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be all printed as a warning.

Comment on lines +668 to +669
auto loop_id = loop_id_opt.value();
if(is_top_level_id(loop_id))
{
return top_level_write_set;
}
else
{
return loop_info_map.at(loop_id).addr_of_write_set_var;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does the above code still require use of one of those methods?

{
// This is a complex expression. Compute the set of root objects expressions
// it can assign.
auto root_objects = dfcc_root_objects(lhs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you could do auto root_objects = dfcc_root_objects(lhs); at the beginning of this function and then the if branch that currently exists at the top would be eliminated for root_objects would equally cover symbol expressions (you would have exactly one entry in root_idents below). I would then expect that all the use of tracked and local be consistent, which is not obvious from the code in the if case above: it doesn't even look at tracked, and I don't know why this is ok to do. Conversely, this else branch never does dfcc_is_cprover_symbol, and again I don't know why this is ok (not) to do.

@remi-delmas-3000 remi-delmas-3000 force-pushed the contracts-dfcc-cfg-info branch from eaf9f6d to 3f2af0f Compare May 2, 2023 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high Code Contracts Function and loop contracts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants