-
Notifications
You must be signed in to change notification settings - Fork 273
Remove invariant on full_lhs_value
in printing goto_tracet
#4153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove invariant on full_lhs_value
in printing goto_tracet
#4153
Conversation
This invariant in the printing is preventing other, otherwise usefull output from being printed, in the case where a goto trace has been produced which contains a `full_lhs_value` set to `nil` in an assignment step.
Do we understand under which conditions this is nil? |
No. I don't understand exactly why we have a |
It's generally discouraged to just blindly remove |
Is printing really the appropriate place for these invariants at all? By this point the bulk of the time consuming evaluation has been completed and I think it is useful to be able to get the output even if it isn't necessarily 100% valid. Could the invariants in trace printing be moved into a separate "validate trace" process instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do dig deeper as others have already suggested.
@@ -132,9 +132,6 @@ void convert_decl( | |||
} | |||
else | |||
{ | |||
DATA_INVARIANT( | |||
step.full_lhs_value.is_not_nil(), | |||
"full_lhs_value in assignment must not be nil"); | |||
full_lhs_value = json(step.full_lhs_value, ns, ID_unknown); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already worth investigating why we even end up in this branch. That means that the symbol on the LHS is not in the namespace, which should only be happening if it is a symbol that symex has introduced. Even then that's got a justification for why the full_lhs_value
should be nil. We've probably got some bug in goto-symex in that case, but I have no idea which one that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in my understanding of this comment that this invariant should hold for both the branch where the symbol is found and when it is not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be the case, yes - and that observation of yours might be the key to resolving this: either it is an invariant, or it should go away completely. For debugging purposes I'd suggest copying this into symex_target_equationt
to figure out where the invariant is first violated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added in symex_target_equationt::assignment
the following invariants:
DATA_INVARIANT(
ssa_full_lhs.is_not_nil(),
"full_lhs_value in assignment must not be nil");
DATA_INVARIANT(
original_full_lhs.is_not_nil(),
"full_lhs_value in assignment must not be nil");
DATA_INVARIANT(
ssa_rhs.is_not_nil(),
"full_lhs_value in assignment must not be nil");
And none of them trigger on the failing case (i.e. still see unknown in the trace). Is that what you meant? I can look at the assignment calls to see what these values are, but don't really know what I'd be looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thk123 Many thanks, that's good to know. Thus things must be going wrong in goto-symex/build_goto_trace.cpp
line 344 I'd say. Would you mind inserting an appropriate invariant in there? I think we end up in boolbvt::bv_get_unbounded_array
, which actually does return nil_exprt
on a couple of branches (but I'm not sure that's a good idea). Actually there are quite a few occurrences of nil_exprt
in solvers/flattening/boolbv_get.cpp
. I'm not sure anyone of them is a good idea (and the hard-coded 100
in line 361 isn't good either). Could you work with @kroening to resolve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it comes down to:
boolbvt::bv_get_unbounded_array
Which is failing to evaluate the size expr. The reason for this failure is the expression is (rewritten for readability):
if (&symex_dynamic::dynamic_object1 == constant(18000000000000))
4;
else
symex::invalid_object.length;
Note the constant exprt on the rhs of the if is:
constant
* type: pointer
* width: 64
* value: 18000000000000
0: address_of
0: symbol
* identifier: symex_dynamic::dynamic_object1
* expression: symbol
* identifier: symex_dynamic::dynamic_object1
* #SSA_symbol: 1
* L1_object_identifier: symex_dynamic::dynamic_object1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So fixing the underling problem seems hard - the safe_pointers is not able to see through the GOTO:
ASSUME !((struct java::array[reference] *)a == null) // Null pointer check
ASSERT ((struct java::array[reference] *)a)->length >= 0 // Array size should be >= 0
ASSUME ((struct java::array[reference] *)a)->length >= 0 // Array size should be >= 0
newarray_tmp1 = ALLOCATE(struct java::array[float], 16ul, false);
*newarray_tmp1 = { [email protected]={ .@class_identifier="java::array[float]" },
.length=0, .data=null };
newarray_tmp1->length = ((struct java::array[reference] *)a)->length;
Obviously this can be fixed by maybe making safe pointers smarter or the front end put the null check nearer, but shouldn't fundamentally a trace still be produced without a fix here? Should bv_get_unbounded_array
stumble on, looking up the array and disregarding the size? I don't think I have enough insight into the expected behaviour to realistically do this unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created #4197 to address the limitations in bv_get_unbounded_array
. Input, and in particular tests would be very much appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
symbol on the LHS is not in the namespace, which should only be happening if it is a symbol that symex has introduced.
Even that shouldn't happen if the correct namespace was used (which at least the new goto-checker implementations should be doing).
No, this should really be checked by SSA validation (the code triggered by |
I'm going to have a dig see if I can produce a minimal example here. It is entirely possible that turning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: eb5b1c4).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100433562
So the essence of the problem is the symbol doesn't exist, because it is a symbol introduced by symex, and we print this trace using original goto model, rather than This looks a little fiddly to fix on our end, so let me make an moral argument for why this fix is ok 😆 Printing a trace is a useful operation both for providing users how there program is broken, but also in debugging code that uses traces. In the latter case, you always want it to print, even (And indeed especially) if the trace is nonsense. Therefore I believe it should always have a bash at printing it. A validate trace step before printing might make sense for the first case. As an aside, if I move this invariant to above the lookup: --- src/goto-programs/json_goto_trace.cpp (date 1549634233000)
+++ src/goto-programs/json_goto_trace.cpp (date 1549981900000)
@@ -118,6 +118,10 @@
const symbolt *symbol;
irep_idt base_name, display_name;
+ DATA_INVARIANT(
+ step.full_lhs_value.is_not_nil(),
+ "full_lhs_value in assignment must not be nil");
+
if(!ns.lookup(identifier, symbol))
{
base_name = symbol->base_name;
@@ -132,9 +136,6 @@
}
else
{
- DATA_INVARIANT(
- step.full_lhs_value.is_not_nil(),
- "full_lhs_value in assignment must not be nil");
full_lhs_value = json(step.full_lhs_value, ns, ID_unknown);
}
Then I can reproduce this invariant on JBMC. Notice the only transformation this nil expr goes through is |
For example: public static void minimal(Object[] a) {
CProver.assume(a != null && a.length > 1);
boolean[] assigned = new boolean[a.length];
assigned[0] = true;
// Cause NPE
Object o = null;
o.toString();
} (Null Pointer Exception added to the end to ensure a trace is produced, there's probably a nicer way - it isn't required). Running this through jbmc: $ jbmc TestClass.class --function "TestClass.minimal:([Ljava/lang/Object;)V" --trace --json-ui Produces a trace that contains: {
"assignmentType": "variable",
"hidden": false,
"internal": true,
"lhs": "byte_extract_little_endian(dynamic_9_array[0l], 0l, signed char)",
"sourceLocation": {
"bytecodeIndex": "17",
"file": "TestClass.java",
"function": "java::TestClass.minimal:([Ljava/lang/Object;)V",
"line": "31"
},
"stepType": "assignment",
"thread": 0,
"value": {
"name": "unknown"
}
}, Note the unknown value is exactly the case where the I also note in this example, the boolean array is cast to a byte array (becuase java bytecode using |
Thank you @thk123 for further debugging. There may be some Java-related problem, but I don't think it's restricted to the Java front-end. Two observations:
in the trace. This isn't appropriate output as it humans wouldn't be able to make sense of this. To me this confirms that the invariant is appropriate, and that we are failing it because of a bug, not because the invariant shouldn't be there. |
PS the cast to byte array, though probably wrong, is not ultimately responsible here - changing the array to a float array does not resolve the issue. |
I would prefer #4372 over this one. |
@thomasspriggs Should this be closed now that #4372 has been merged? |
I am happy to close this. @thk123 Can you confirm that you are also happy for this PR to be closed? |
Yup - we now need to flip the printing back on since the invariant should have gone away... |
This invariant in the printing is preventing other, otherwise usefull
output from being printed, in the case where a goto trace has been
produced which contains a
full_lhs_value
set tonil
in an assignmentstep.