Skip to content

goto trace contains ssa information it shouldn't #902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
owen-mc-diffblue opened this issue May 10, 2017 · 16 comments
Open

goto trace contains ssa information it shouldn't #902

owen-mc-diffblue opened this issue May 10, 2017 · 16 comments
Labels

Comments

@owen-mc-diffblue
Copy link
Contributor

owen-mc-diffblue commented May 10, 2017

Steps to reproduce:

  1. Run cbmc on regression/cbmc/pointer-function-parameters-2/main.c with the flags "--function fun --cover branch".

Actual results:
In the output, the following lines appear.

Test suite:
a=((signed int **)NULL), tmp$1=((signed int *)NULL) + 1, tmp$2=0
a=&tmp$1!0, tmp$1=((signed int *)NULL), tmp$2=0
a=&tmp$1!0, tmp$1=&tmp$2!0, tmp$2=0
a=&tmp$1!0, tmp$1=&tmp$2!0, tmp$2=4

Expected results:
The variable names in the address-of expressions shouldn't have the "!0" suffixes.

Discussion:
The "!0" are used in the ssa phase to indicate the thread, but they should have been removed when the ssa-trace was transformed into goto-trace. I think that this isn't working for address-of expression on the right hand side.

@owen-mc-diffblue
Copy link
Contributor Author

@tautschnig I see you've edited some of the code in build_goto_trace() (the one with five arguments) in goto-symex/build_goto_trace.cpp, which is where I think the problem is. Do you understand it enough to decide if my hypothesis is correct, and if so how to fix it?

@tautschnig
Copy link
Collaborator

What I know very little about is how the test suite output and the goto trace relate - does the test output extract strings from the trace?

@owen-mc-diffblue
Copy link
Contributor Author

Yes, the only parameter the test output takes is the goto trace, so I'm fairly sure it does. This bug is also exhibited in the interpreter output, which is generated by reading the goto trace.

@owen-mc-diffblue
Copy link
Contributor Author

To be more precise, the lines of the test suite are generated here: https://github.com/diffblue/cbmc/blob/master/src/cbmc/bmc_cover.cpp#L460 . The sections just after the equals sign are output here: https://github.com/diffblue/cbmc/blob/master/src/cbmc/bmc_cover.cpp#L140 . So they are coming directly from the goto trace. from_expr() resolves to expr2c().

@owen-mc-diffblue
Copy link
Contributor Author

owen-mc-diffblue commented May 25, 2017

I dug into this a bit more. The step.io_arg.front() from https://github.com/diffblue/cbmc/blob/master/src/cbmc/bmc_cover.cpp#L140 which has "!0" in it is being set in https://github.com/diffblue/cbmc/blob/master/src/goto-symex/build_goto_trace.cpp#L281. When I run my Steps To Reproduce with a breakpoint on this line, on the fifth time that the breakpoint is hit tmp is the address of tmp$1!0. Following the prop_conv.get() I find it calls https://github.com/diffblue/cbmc/blob/master/src/solvers/flattening/boolbv_get.cpp#L32, which calls https://github.com/diffblue/cbmc/blob/master/src/solvers/flattening/bv_pointers.cpp#L628. https://github.com/diffblue/cbmc/blob/master/src/solvers/flattening/bv_pointers.cpp#L677 calls https://github.com/diffblue/cbmc/blob/master/src/solvers/flattening/pointer_logic.cpp#L130, which returns an exprt which is &tmp$1!0.

Without understanding this code very well, it seems to me that SSA_step has io_args and converted_io_args, and the latter has had the SSA information (L0 and so on) removed in https://github.com/diffblue/cbmc/blob/master/src/goto-symex/symex_target_equation.cpp#L876. I imagine the intention of this is to not put that SSA information in the goto trace. But line https://github.com/diffblue/cbmc/blob/master/src/goto-symex/build_goto_trace.cpp#L280 involves getting the address of a symbol, and it copies a symbol which still has SSA information, so the SSA information leaks into the goto trace and causes me problems later. It seems to get this symbol with SSA information from prop_conv.pointer_logic.

@tautschnig Does this make any sense? If so, any ideas how to fix the problem? Or any ideas who else to ask, if you don't know?

@tautschnig
Copy link
Collaborator

My apologies for not attending to this in a while.

For a start, it should be noted that it's not just test-input that contains the L0/L1 identifiers, but the same is true for the counter-example trace. I would suggest to discuss with @kroening whether this is a bug or a feature. I'd lean towards the latter as this information is essential to distinguish multiple objects that the address may be taken of.

If this is a bug of some sorts, or at least considered a bug in test-input generation, then take a look at what the full tree representation available at the time of generating the output is. For example. &tmp$2!0 is:

constant
  * type: pointer
      * #source_location:
        * file: ../regression/cbmc/pointer-function-parameters-2/main.c
        * line: 1
        * working_directory: /Users/tautschn/Desktop/AwsArgCBMC.git/src
      0: signedbv
          * width: 32
          * #c_type: signed_int
  * value: 0000001100000000000000000000000000000000000000000000000000000000
  0: address_of
      * type: pointer
          * #source_location:
            * file: ../regression/cbmc/pointer-function-parameters-2/main.c
            * line: 1
            * working_directory: /Users/tautschn/Desktop/AwsArgCBMC.git/src
          0: signedbv
              * width: 32
              * #c_type: signed_int
      0: symbol
          * type: signedbv
              * width: 32
              * #c_type: signed_int
          * identifier: tmp$2!0
          * expression: symbol
              * type: signedbv
                  * width: 32
                  * #c_type: signed_int
              * identifier: tmp$2
          * L0: 0
          * L1_object_identifier: tmp$2!0
          * #SSA_symbol: 1

That is, you have the full ssa_exprt at hand (as an operand to address_of). If removing the L0/L1 labeling is the right thing to do, then you'd want to do:

// we only expect constants here
exprt expr=to_constant_expr(step.io_args.front());
if(expr.has_operands() && expr.op0().id()==ID_address_of)
{
  exprt &op=to_address_of_expr(expr.op0()).op();
  op=to_ssa_expr(op).get_original_expr();
}
// use expr in the output

Hope this helps!

@martin-cs
Copy link
Collaborator

martin-cs commented May 31, 2017

When I run my Steps To Reproduce with a breakpoint on this line, on the fifth time that the
breakpoint is hit tmp is the address of tmp$1!0. Following the prop_conv.get() I find it calls
https://github.com/diffblue/cbmc/blob/master/src/solvers/flattening/boolbv_get.cpp#L32, which calls
https://github.com/diffblue/cbmc/blob/master/src/solvers/flattening/bv_pointers.cpp#L628.
https://github.com/diffblue/cbmc/blob/master/src/solvers/flattening/bv_pointers.cpp#L677 calls
https://github.com/diffblue/cbmc/blob/master/src/solvers/flattening/pointer_logic.cpp#L130, which
returns an exprt which is &tmp$1!0.

The calls into the back-end (solvers/flattening) look roughly as expected and I would expect the solver / back-end to handle things with the SSA name, rather than the original / program name.

Without understanding this code very well, it seems to me that SSA_step has io_args and
converted_io_args, and the latter has had the SSA information (L0 and so on) removed in
https://github.com/diffblue/cbmc/blob/master/src/goto-symex/symex_target_equation.cpp#L876. I
imagine the intention of this is to not put that SSA information in the goto trace.

This matches my expectation (FWIW). I believe the trace should exist at the level of goto-programs and be independent of how it was generated (i.e. these can be built from BMC SSA, IIRC symex also builds them using a different symex engine (see #749), IMPARA builds them).

But line https://github.com/diffblue/cbmc/blob/master/src/goto-symex/build_goto_trace.cpp#L280
involves getting the address of a symbol, and it copies a symbol which still has SSA information, so
the SSA information leaks into the goto trace and causes me problems later.

This seems plausible.

It seems to get this symbol with SSA information from prop_conv.pointer_logic.

Yes; the SSA naming has to be present within the formula that the bmc symex creates and so access to / from the variables in the formula should use the SSA naming. The removal should be done outside the solver when the model is being converted to a trace.

@martin-cs
Copy link
Collaborator

For a start, it should be noted that it's not just test-input that contains the L0/L1 identifiers, but the
same is true for the counter-example trace. I would suggest to discuss with @kroening whether this is > a bug or a feature. I'd lean towards the latter as this information is essential to distinguish multiple
objects that the address may be taken of.

I can see why dynamically created objects need to be numbered and why their numbering is more / different to the SSA numbering and should persist through the conversion to a goto trace but I can't see that the pointer variables would need this numbering.

@tautschnig
Copy link
Collaborator

I can see why dynamically created objects need to be numbered and why their numbering is more / different to the SSA numbering and should persist through the conversion to a goto trace but I can't see that the pointer variables would need this numbering.

It's completely up to the code generating the output to decide what is appropriate. The SSA information is persisted all the way, so (as indicated in my comment) people may choose to build the output in any way deemed most appropriate.

@owen-mc-diffblue
Copy link
Contributor Author

@tautschnig The code snippet you gave seems to not work on functions with parameters that are arrays

@tautschnig
Copy link
Collaborator

Would you have an example code snippet?

@tautschnig
Copy link
Collaborator

It seems my patch had a syntax error anyway. The below appears to work:

diff --git a/src/cbmc/bmc_cover.cpp b/src/cbmc/bmc_cover.cpp
index fd900dda1..911de85b4 100644
--- a/src/cbmc/bmc_cover.cpp
+++ b/src/cbmc/bmc_cover.cpp
@@ -137,7 +137,19 @@ public:
         test+=id2string(step.io_id)+"=";

         if(step.io_args.size()==1)
-          test+=from_expr(bmc.ns, "", step.io_args.front());
+        {
+          // we only expect constants here
+          exprt expr=to_constant_expr(step.io_args.front());
+          if(expr.has_operands() && expr.op0().id()==ID_address_of)
+          {
+            exprt *op=&(to_address_of_expr(expr.op0()).object());
+            while(op->id()==ID_member || op->id()==ID_index)
+              op=&(op->op0());
+            *op=to_ssa_expr(*op).get_original_expr();
+          }
+          // use expr in the output
+          test+=from_expr(bmc.ns, "", expr);
+        }
       }
     }
     return test;

@owen-mc-diffblue
Copy link
Contributor Author

Thanks @tautschnig , that works. I'm going to make a PR to test-gen-support to merge this, but in a slightly different place so it actually affects the goto trace.

@owen-mc-diffblue
Copy link
Contributor Author

I've thought of a better way of showing what I think the problem is. Make a file with this as its contents:

void fun(int *x, int y)
{
  if(y == 2)
  {
    assert(*x == 1);
  }
}

Run cbmc on it with these flags: --function fun --show-goto-functions. In the output you'll fine tmp$1 but not tmp$1!0. Now run cbmc on it with these flags: --function fun --trace. You should get output a lot like this:



Trace for fun.assertion.1:

State 21 file pointer_param.c line 1 thread 0
----------------------------------------------------
  INPUT x: &tmp$1!0 (0000001000000000000000000000000000000000000000000000000000000000)

State 22 file pointer_param.c line 1 thread 0
----------------------------------------------------
  INPUT tmp$1: 0 (00000000000000000000000000000000)

State 25 file pointer_param.c line 1 thread 0
----------------------------------------------------
  INPUT y: 2 (00000000000000000000000000000010)

State 28 file pointer_param.c line 1 thread 0
----------------------------------------------------
  x=&tmp$1!0 (0000001000000000000000000000000000000000000000000000000000000000)

State 29 file pointer_param.c line 1 thread 0
----------------------------------------------------
  y=2 (00000000000000000000000000000010)

Violated property:
  file pointer_param.c line 5 function fun
  assertion *x == 1
  *x == 1


** 1 of 1 failed (1 iteration)
VERIFICATION FAILED

To me this doesn't make sense. In state 28 we're assigning x=&tmp$1!0, but a variable of that name doesn't exist. There is a variable with the name tmp$1 - it's one of the inputs.

Maybe the problem here is just that the inputs get the ssa ids removed, in which case we could perhaps make a list of them and only amend variable names in address-of expressions when they are one of the inputs.

@tautschnig
Copy link
Collaborator

To me this doesn't make sense. In state 28 we're assigning x=&tmp$1!0, but a variable of that name doesn't exist. There is a variable with the name tmp$1 - it's one of the inputs.

I may be repeating myself: please discuss this with @kroening. As far as I am aware it is a conscious decision to output in the way it is done (i.e., with L0/L1 SSA indices as only then an object can be uniquely identified), and thus it might mainly be a matter of providing suitable documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants