Skip to content

Add support for variables entering scope via a goto #8091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 22, 2023

Conversation

thomasspriggs
Copy link
Contributor

@thomasspriggs thomasspriggs commented Dec 4, 2023

This PR adds support for variables entering scope via a goto. This fixes the following github issue - #7997

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@thomasspriggs thomasspriggs force-pushed the tas/fix_goto_decls branch 2 times, most recently from ee4c7d7 to 747bb63 Compare December 5, 2023 19:55
Copy link

codecov bot commented Dec 5, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (8fee19a) 79.09% compared to head (49e8e53) 79.10%.

Files Patch % Lines
src/goto-programs/goto_convert_exceptions.cpp 81.25% 3 Missing ⚠️
src/goto-programs/scope_tree.cpp 88.46% 3 Missing ⚠️
src/goto-programs/goto_convert_class.h 71.42% 2 Missing ⚠️
src/goto-programs/goto_convert.cpp 98.82% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8091   +/-   ##
========================================
  Coverage    79.09%   79.10%           
========================================
  Files         1695     1695           
  Lines       196514   196596   +82     
========================================
+ Hits        155436   155517   +81     
- Misses       41078    41079    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@thomasspriggs thomasspriggs changed the title Draft fix for issue #7997 Add support for variables entering scope via a goto Dec 5, 2023
@thomasspriggs thomasspriggs marked this pull request as ready for review December 5, 2023 22:46
@thomasspriggs thomasspriggs added aws Bugs or features of importance to AWS CBMC users aws-high and removed do not review labels Dec 6, 2023
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that we need to insert a fresh program variable. What if we used dominator analysis instead, as follows: for any use of a variable, lift its declaration to a place so that the declaration dominates all uses. Thereafter, move the "DEAD" statement so that it post-dominates the declaration.

struct declaration_statet
{
goto_programt::targett instruction;
std::unordered_set<irep_idt, irep_id_hash> if_conditions_added;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what "if_conditions_added" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now renamed and documented this in the two commits I just pushed. Please take another look to see if the meaning is clear to you now.

// }
// to code which looks like -
// {
// __CPROVER_bool going_to::user_label;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that something that can be solved at compile time should introduce runtime cost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel confident that the change proposed in this PR will result in an analysis which is correct. I am not certain that the the issue can be solved at compile time in all cases. As such I would propose that we fix first and that optimisation should be considered as a follow-up if there is indeed a notable runtime cost. Please also bear in mind that this transformation is only performed in the case of a goto jumping into a scope and not for other constructs.

If you look at the regression/cbmc/declaration-goto/re-enter_scope.c included in this PR then I would say that -

  1. The assert(i == 42) assertion should fail. The scope of the i and j variables does not intersect, therefore a C compiler should be free to use the same stack location for both variables. Therefore the i variable is not guaranteed to hold the same value on re-entering the scope.
  2. If we attempt to solve this problem in cbmc by moving the declaration of the i variable alone, then cbmc would report SUCCESS for assert(i == 42) when it should fail.

@thomasspriggs
Copy link
Contributor Author

I am not convinced that we need to insert a fresh program variable. What if we used dominator analysis instead, as follows: for any use of a variable, lift its declaration to a place so that the declaration dominates all uses. Thereafter, move the "DEAD" statement so that it post-dominates the declaration.

I think that approach may have otherwise unforeseen consequences in the presence of loops. If a goto into a loop causes a variable to come into scope, then the common ancestor would be outside of the loop. Moving the declaration there would result in a single variable being reused for all loop iterations, rather than a fresh variable being declared in each iteration. This could then lead to all loop iterations then seeing the same non-det value, rather than a fresh nondet value for each iteration.

Copy link
Contributor

@esteffin esteffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor changes.

Also some architectural commits may be squashed to obtain a slightly cleaner history (the first 3 commits may be squashed, the one changing if_conditions_added can be lifted to when that function was added, ...).

const build_declaration_hops_inputst &inputs)
{
// In the case of a goto jumping into a scope, the declarations (but not the
// initialisations) need to be executed. This function performs a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ US english

Suggested change
// initialisations) need to be executed. This function performs a
// initializations) need to be executed. This function performs a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our coding standards do not specify either US or British English. Given that the US spelling can be understood by users of British English and vice versa, it does not seem worthwhile to change this.

// is illegal for C++ non-pod types and impossible in Java in any case.
// This is however valid C.
// Variables *entering* scope on goto, is illegal for C++ non-pod types
// and impossible in Java. This is however valid C and should be taken
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ Actually in C it's illegal to jump entering the scope of a variable-length array or variably-modified typed variables, so the comment is wrong and should rephrased.

However in general it is possible, so I'm just raising as a :pick and not blocking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this detail to the comment.

@thomasspriggs
Copy link
Contributor Author

Few minor changes.

Also some architectural commits may be squashed to obtain a slightly cleaner history (the first 3 commits may be squashed, the one changing if_conditions_added can be lifted to when that function was added, ...).

I have addressed your various comments.

The commits dealing with the rename were added in order to address Michael's comment. I did not wish to rebase whilst you were in the middle of reviewing as this can cause the github UI to lose your place.

The Initial commits are separate as I was concerned about the legibility of a commit which combines a file rename with changes in the file being renamed.

@kroening
Copy link
Member

kroening commented Dec 8, 2023

We'll need to split up goto conversion by language.

@remi-delmas-3000
Copy link
Collaborator

Hi, I was reading the C11 standard about storage durations and goto statements (6.2.4 Storage durations of objects), and how it interacts with gotos and variable length arrays:

 For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration or
compound literal is reached in the execution of the block; otherwise, the value becomes
indeterminate each time the declaration is reached

So I would think that you also need to handle cases where a goto jumps into a scope after the declaration, in that case the symbol must be declared but has a undefined value. Right now the fix seems to handle only the cas where a goto jumps over some declarations within a block (block in the C source sense).

At goto level things could be handled by moving all DECLs of a C block to the beginning of the first GOTO block encoding that C block. That would handle gotos jumping over C decls. For gotos jumping somehwere in the middle a C block from outside of it, you could insert a new GOTO basic block of DECLs that declares all locals of the corresponding C block.

{
  int x;
  if (x >0) goto LABEL; 
  {
      int y = 0;
      LABEL:
      y += 1;
      assert(y ==1);
  } 
}

would give this GOTO program :

0:
 DECL x;
 IF x>0 GOTO 2;

1:
 DECL y; 
 ASSIGN y := 0;
 GOTO 3; // skip over the redeclaration block

// re-declaration block for goto that jumps into the block
2: 
  DECL y;

3: 
  ASSIGN y := y + 1;
  ASSERT(y == 1);
  DEAD y; // end of scope for y 
  DEAD x; // end of scope for x

@thomasspriggs
Copy link
Contributor Author

@remi-delmas-3000 Before we arrived at the solution presented in this PR we tried an implementation which relied on multiple declarations for the same variable. This approach did not yield the results we were hoping for as the two declarations were treated as declarations for different variables despite them both using the same identifier. We did not see the the phi functions we expected during goto_symext::merge_goto. Attempts to change symex to work as we expect for goto like this could be tricky as there could be a risk of introducing issues around intended variable shadowing for example. As the idea you have presented also relies on multiple DECL statements for the same y variable, I think it would suffer from the same issues.

The solution presented in this PR is also expected to work for backwards gotos.

Example input with backwards goto.
#include <assert.h>
#include <stdbool.h>

int main()
{
  bool if_condition1;
  if(if_condition1)
  {
  	int x;
  	x = 0;
  my_label:
  	assert(x == 0);
  }
  bool if_condition2;
  if(if_condition2)
  	goto my_label;
  return 0;
}
Converted goto.
main /* main */
      // 0 file ../backwards_goto_example1.c line 12 function main
      DECL going_to::my_label : 𝔹
      // 1 file ../backwards_goto_example1.c line 12 function main
      ASSIGN going_to::my_label := false
      // 2 file ../backwards_goto_example1.c line 6 function main
      DECL main::1::if_condition1 : c_bool[8]
      // 3 file ../backwards_goto_example1.c line 7 function main
      IF ¬(main::1::if_condition1 ≠ 0) THEN GOTO 3
      // 4 file ../backwards_goto_example1.c line 9 function main
   1: DECL main::1::1::x : signedbv[32]
      // 5 file ../backwards_goto_example1.c line 9 function main
      IF going_to::my_label THEN GOTO 2
      // 6 file ../backwards_goto_example1.c line 10 function main
      ASSIGN main::1::1::x := 0
      // 7 file ../backwards_goto_example1.c line 12 function main
   2: ASSIGN going_to::my_label := false
      // 8 file ../backwards_goto_example1.c line 12 function main
      // Labels: my_label
      ASSERT main::1::1::x = 0 // assertion x == 0
      // 9 file ../backwards_goto_example1.c line 13 function main
      DEAD main::1::1::x
      // 10 file ../backwards_goto_example1.c line 14 function main
   3: DECL main::1::if_condition2 : c_bool[8]
      // 11 file ../backwards_goto_example1.c line 15 function main
      IF ¬(main::1::if_condition2 ≠ 0) THEN GOTO 4
      // 12 file ../backwards_goto_example1.c line 16 function main
      DEAD main::1::if_condition2
      // 13 file ../backwards_goto_example1.c line 16 function main
      ASSIGN going_to::my_label := true
      // 14 file ../backwards_goto_example1.c line 16 function main
      GOTO 1
      // 15 file ../backwards_goto_example1.c line 17 function main
   4: SET RETURN VALUE 0
      // 16 file ../backwards_goto_example1.c line 17 function main
      DEAD main::1::if_condition2
      // 17 file ../backwards_goto_example1.c line 17 function main
      DEAD main::1::if_condition1
      // 18 file ../backwards_goto_example1.c line 18 function main
      END_FUNCTION

Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarising the discussion above:

  1. What's the performance impact of this PR? None, except when the program actually exhibits the situation that the PR fixes.
  2. What are alternative solutions? Moving decl and dead statements to pre/postdominator locations and inserting nondet assignments in the appropriate places.
  3. Should goto conversion be language specific? Control flow and scoping semantics needs to be added at some point, either in the construction of the symbol table, the conversion to the goto program or during symbolic execution. Currently, this is done in the goto conversion. We assume that all supported front ends produce symbol tables that are compatible with what goto conversion is doing. If the language semantics deviates then we patch up the goto program (as we do for Java). This PR doesn't seem to be a good enough reason to change that, in particular as C is the only language concerned.

@peterschrammel
Copy link
Member

@thomasspriggs, needs a rebase.

Because this will be needed later for entering a scope via a C front end
goto statement. The declaration is optional because it is only needed
for named front end declarations. It is not needed for anonymous symbols
which cannot be referenced by user code.
This data structure is going to be used for declarations as well as
destructors. So it needs to be renamed to match the expanded use case.
To match the new class names.
So it can be retrieved when producing the code for jumping into a
scope.
Jumping into scope will may cause these declarations to be executed.
Therefore we need to know what the declarations are for each scope in
order acheive this.
So that we can track what changes have been made to each declaration.
When a goto which jumps into a scope is converted, the declarations
for that scope need to be passed through. It needs to be the same
declaration in both execution paths, so that symex will correctly merge
the two different states when generating the phi functions.

The requirement for executing the same declaration whether entering the
scope through a regular route vs a goto into the middle of the block is
acheived by setting up a chain of hops where -
 * Original goto hops to first declaration.
 * After first declaration we hop to the next declaration if entering
   via a goto.
 * Afer the final declaration we hop to the originally specified label.

The identifiers in the existing `symex_should_filter_value_sets` test
need to be incremented, because the instance being checked is now the
second time the variable is added to the scope.
The names of the parameters of `unwind_destructor_stack` were the
opposite way round in the header from the `.cpp`. The order in the
`.cpp` is the actual order as it is currently implemented. So by
(un)swapping the order in the header the names are made consistent.
@esteffin esteffin merged commit ccec4dd into diffblue:develop Dec 22, 2023
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Feb 6, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Feb 6, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Feb 6, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Mar 27, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Apr 15, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Apr 24, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Apr 30, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jun 13, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jun 17, 2024
We must convert the condition of an if/then/else first to make sure any
declarations produced by converting the condition have a suitable
scope-tree node. Previously, gotos in either branch of the if-then-else
were led to believe that declarations resulting from condition were not
yet in scope. The feature from diffblue#8091 would, thus, result in a spurious
loop back to the declaration to put it in scope before following the
goto edge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants