-
Notifications
You must be signed in to change notification settings - Fork 273
Reorganise C parameter creation #767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganise C parameter creation #767
Conversation
Note that this patch makes one test fail: cbmc/unsigned__int128. I'll need help working out how to solve this problem. I paste below a description of the problem that I put on slack. -- I am confused about the test cbmc/unsigned__int128. It works on master, but not with my patch. Here is my understanding of the test. The C function If you look at the goto generated by CBMC on master, it sets the second argument to the function to be a null pointer, because it can't cope with pointers or arrays yet. I'm slightly surprised that it correctly checks some arithmetic involving My patch makes CBMC generate code so it can cope with C function parameters which are pointers (or pointers to pointers, ...). I haven't explicitly done anything for arrays, but actually it thinks the array parameter is a pointer (I'm not sure how I can distinguish them), so it generates extra code for it which makes a nondet boolean and conditions on that to say that the argument is either null or it points to a temporary uint128_t variable (which it makes). Note that it creates a single uint128_t - in an ideal world it would create an array of 7 of them. Now the test fails, on all the lines involving |
338ba47
to
74f8eaa
Compare
I'd recommend changing the unsiged___int128 test to have unquestionable semantics:
Note the change to KNOWNBUG: the test actually does fail with current master when changed in this way. So there's an unrelated bug to be fixed. |
8742644
to
74f8eaa
Compare
@owen-jones-diffblue Brilliant, thank you for debugging and actually fixing the test!!! Would you mind, however, placing this in a separate PR? I believe this should be merged ASAP (and it's really unrelated to your additional work). |
Yes, I'd just come to the same conclusion. I've made a separate PR. #771 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice piece of work, but I'd like to see some clarification on a number of questions.
src/ansi-c/ansi_c_entry_point.cpp
Outdated
|
||
result[i]=symbol_expr; | ||
i++; | ||
const code_typet::parametert &p = parameters[param_number]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit picking: no whitespace around =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/ansi-c/ansi_c_entry_point.cpp
Outdated
init_code, | ||
allow_null, | ||
symbol_table, | ||
p.source_location(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has a large number of parameters, and seemingly at least 2 can be derived from the first one. Can this be simplified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to keep it similar to object_factory() in src/java_bytecode/java_object_factory.cpp so that they can be easily combined in future. If it is preferable then I can instead simplify it and reduce the number of parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense. I'm wondering whether the Java code should be refactored instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into refactoring this aspect of the java code. object_factory() is used in two places in java_entry_point(): once in a loop over all the parameters of the function we are looking at, and once in a loop over all symbols in the symbol table which are static plus some other conditions, or which are string literals. Without knowing the code better it's not clear to me that I can reduce the number of parameters in a way which works in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for checking!
src/ansi-c/ansi_c_entry_point.cpp
Outdated
|
||
main_arguments[param_number]=c_nondet_symbol_factory( | ||
p, | ||
param_number, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the position of the parameter a required piece of information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only used to construct a fallback base name which is used if the parameter doesn't have one. That is what the code did before I refactored it. Other options include passing in the fallback base name or trying to establish that the parameter will always have a base name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok; I'm inclined to suggest that the name should be passed in (effectively as constructed in that function) as the problem can only arise at the top level, recursive calls would not suffer from that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/ansi-c/ansi_c_entry_point.cpp
Outdated
input_recorder.record_input( | ||
p_symbol.base_name, | ||
main_arguments[param_number]); | ||
input_recorder.declare_recorded_inputs(init_code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input recorder is passed as an argument to c_nondet_symbol_factory
, but then there still is a need for an extra manual step. I think this should be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I have moved this entirely within c_nondet_symbol_factory().
src/ansi-c/ansi_c_entry_point.cpp
Outdated
@@ -505,8 +481,11 @@ bool ansi_c_entry_point( | |||
else | |||
{ | |||
// produce nondet arguments | |||
call_main.arguments()= | |||
build_function_environment(parameters, init_code, symbol_table); | |||
call_main.arguments()=build_function_environment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit pick: Preferably wrap lines after the =
(applies elsewhere in your patch too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added an argument (message_handler, to make it more like the java code), so it no longer fits into one line if I wrap after the =
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, apologies for being unclear: also wrap after the =
(and then indent the arguments by two additional spaces).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
code_ifthenelset null_check; | ||
exprt null_return= | ||
zero_initializer(returns_null_sym.type, loc, ns, message_handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to get away with from_integer(0, returns_null_sym.type)
, at which point you might not even need a message_handlert
anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
type, | ||
prefix); | ||
|
||
code_declt decl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code declt decl(tmp_symbol.symbol_expr());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
symbol_exprt symbol_expr(identifier, type); | ||
|
||
code_declt decl; | ||
decl.symbol()=symbol_expr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code_declt decl(symbol_expr);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
message_handler, | ||
input_recorder); | ||
state.gen_nondet_init(symbol_expr, loc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why just those to type classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied the java code. The pointer case seems to cover arrays and structs as well, though this patch doesn't properly deal with them.
src/ansi-c/c_nondet_symbol_factory.h
Outdated
symbol_tablet &symbol_table, | ||
const source_locationt &loc, | ||
const typet &type, | ||
const std::string &prefix="tmp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above appear to be helper functions that ought to be just marked static and not be exported via the header file either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
a8f7b4b
to
e4fe932
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main ask is about the high-level comment - I don't think I have yet understood the code being generated very well. It looks overly complicated to me, but I am likely missing something.
src/ansi-c/ansi_c_entry_point.cpp
Outdated
i++; | ||
const code_typet::parametert &p=parameters[param_number]; | ||
|
||
bool allow_null=true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that this is placed in a local variable rather than passing true
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I guess it'll always be true here. I was thinking that when we add arrays and structs we'd need to have some logic to choose it, but I guess not at this level.
if(!assume_non_null) | ||
{ | ||
// Will generate the following code: | ||
// IF NONDET(_Bool) != FALSE THEN GOTO <set_null_label_str> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the detailed comments. Could you please add the overall code in one place to understand what's intended to happen here at high level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move them all into one place, rather than have them in one place and spread out.
new_symbol.base_name=base_name; | ||
new_symbol.type=type; | ||
|
||
symbol_table.move(new_symbol); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use symbol_table.add and make sure this symbol is actually new (add has a return value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also wrote the function comments for symbol_table.add() and symbol_table.move(), which I'll put in a separate PR.
e4fe932
to
350a7be
Compare
// <code generated by gen_nondet_init()> | ||
// GOTO <init_done_label_str> | ||
// <set_null_label_str>: <expr> = ((<pointer_type>)NULL); | ||
// <init_done_label_str>: SKIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding those comments! It seems the code here really mixes up two concepts (I understand that you've just copied this from the Java version): the code is being generated as if in the GOTO intermediate level, but it's being done using high-level language constructs. There does not seem to be any need for labels or gotos here. Just generate a code_ifthenelse
with a then_case
and an else_case
. This avoids the static variable and synthetic labels, and will thus make the code much more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done this. I had to make the auxiliary variables static, otherwise they were going out of scope at the end of the else clause, being declared dead, and CBMC didn't produce the right test suite.
350a7be
to
55afbae
Compare
I haven't made any changes to the java code since the last round - I'll do it once the C code has been reviewed. I included a commit to turn off the failing test that is fixed in #771 so that the CI can produce the correct results. This commit should be removed before this PR is merged. Using code_ifthenelset properly makes things a lot simpler. Unfortunately it introduced issues of scope, so I've had to make all the intermediate variables have is_static_lifetime true so they don't go out of scope at the end of the else clause. This means that they aren't declared in the output of show-goto-functions, which I feel is a bit less clear, but seems to be the desired behaviour for is_static_lifetime (=global?) variables. Since the declarations are not doing anything I've removed them. Java already uses is_static_lifetime=true intermediate variables, I believe. One alternative would be to collect all the variable declarations in a separate code block and output them before all the other code for that parameter, in a similar way that we collect the INPUT() lines to the end. I made a helper function, concatenate_code_blocks(), which I feel should be a method on code_blockt. |
Cool, I'll take a look.
I've taken another look at #771, and merged it to avoid further overhead. You might thus remove your commit (after rebasing).
Ok, I'll take a look. I'm pretty sure that the declarations should just be placed ahead of any if/else anyway.
I fully agree. |
|
||
\*******************************************************************/ | ||
|
||
static symbolt &c_new_tmp_symbol( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can return it as a const symbolt &
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
{ | ||
target_block.add(to_code(operand)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested in your comment, this should really go in util/std_code.h, as a member function of code_blockt
. It would then be worthwhile using reserve
to do the re-allocation of memory just once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// INPUT("tmp$<temporary_counter>", tmp$<temporary_counter>); | ||
|
||
symbolt &aux_symbol=c_new_tmp_symbol(symbol_table, loc, allocate_type); | ||
aux_symbol.is_static_lifetime=static_lifetime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe static lifetime should actually be a parameter to c_new_tmp_symbol
? setting it in there once and that re-setting it here seems questionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, setting it in two places was a mistake. Well, hopefully I'll stop using static lifetime tmp symbols, but if I do continue using them I'll make it an option, and definitely only do it in one place.
else | ||
{ | ||
// Add the following code to init_code: | ||
// IF !(NONDET(_Bool) == FALSE) THEN GOTO <label1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't exactly what the code now does, please update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly what you mean. Do you mean that changing the condition of the if-then-else clause means that this line changes? I ended up having to back out that change because it wouldn't run with just --cover branch, without --show-goto-functions.
const exprt null_return=from_integer(0, c_bool_typet(1)); | ||
|
||
code_ifthenelset null_check; | ||
null_check.cond()=equal_exprt(nondet_bool, null_return); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This equality test on nondet is redundant - it's sufficient to just do null_check.cond()=c_get_nondet_bool(c_bool_typet(1));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've been thinking that the code that was being generated here was a bit clunky, but I didn't see the obvious fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that generated an error when running without --show-goto-functions. I ran on pointer-function-parameters/main.c with the flags --function fun --cover branch and I got this output:
Passing problem to propositional reduction
converting SSA
######### lhs: symbol
* type: bool
* identifier: goto_symex::\guard#1
* expression: symbol
* type: bool
* identifier: goto_symex::\guard
* L2: 1
* L1_object_identifier: goto_symex::\guard
* #SSA_symbol: 1
######### rhs: typecast
* type: c_bool
* width: 1
0: nondet_symbol
* type: bool
* identifier: symex::nondet0
* #source_location: nil
equality without matching types
/home/owen/workspace/cbmc/src/cbmc/cbmc exited with code 10
Going back to using an equal_exprt got rid of the problem.
// pointer to a pointer temporary variables are created in else clauses, | ||
// and thus go out of scope before the function is called. | ||
exprt allocated= | ||
allocate_object(non_null_inst, expr, subtype, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make this call return two code blocks -- a declaration block and an assignment block, then you should be able to get away without static-lifetime objects. This would be highly preferable for the concurrency encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little bit more complicated than that. Because when this function recursively calls itself, even if we put the variable declaration before the if-else statement it can end up inside another else clause, and thus going out of scope before we call the entry function. I can however do it by collecting all the variable declarations separately from the other code and then putting them in the right order at the very end, a bit like what we're currently doing with input_recorder.
55afbae
to
358fb48
Compare
I've updated the PR addressing all comments. I realised I could deal with putting all the variable declarations first and putting all the INPUT statements last at the same time by keeping a list of pointers to the symbols that had been created and generating all the code at the very end. That removes all the code for input_recordert, which makes the code a lot shorter. As before, I haven't updated the java code, but I will do so once it's clear that the C code is taking the right approach. |
|
||
code_ifthenelset null_check; | ||
null_check.cond()= | ||
equal_exprt(c_get_nondet_bool(c_bool_typet(1)), false_expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address the equality-without-matching-types issue you should probably just do
null_check.cond()=side_effect_expr_nondett(bool_typet());
The issue is the c_bool_typet
placed in here. Apologies for not spotting that any earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Make it deal with pointer parameters to C functions. We follow the java code to prepare for one day merging them. The C code doesn't yet deal with structs or arrays, which the java code does.
Following the review of the equivalent C code, these changes were made: set replaced with unordered set; helper functions declared static and not exported in header; zero_initializer() replaced with from_integer() and all uses of message_handlert removed;
358fb48
to
f777e29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! You might want to add the remaining cleanup for the Java code, if feasible.
Yes, I definitely want to clean up the java, especially the simpler use of if_then_elset. |
I created #853 to do the final refactoring I'd intended. |
Make it match the java code more closely to prepare for (a) one day
merging them and (b) dealing with pointer parameters.