-
Notifications
You must be signed in to change notification settings - Fork 274
Fix bug that prevented --malloc-may-fail from working #5500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug that prevented --malloc-may-fail from working #5500
Conversation
@@ -186,17 +186,15 @@ void ansi_c_internal_additions(std::string &code) | |||
CPROVER_PREFIX "size_t size, " CPROVER_PREFIX "bool zero);\n" | |||
"const void *" CPROVER_PREFIX "alloca_object = 0;\n" | |||
|
|||
"int " CPROVER_PREFIX "malloc_failure_mode="+ | |||
std::to_string(config.ansi_c.malloc_failure_mode)+";\n" | |||
"extern int " CPROVER_PREFIX "malloc_failure_mode;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the malloc-X options be removed from goto-cc now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterschrammel we never added them to goto-cc
. What we did add to goto-cc
was --object-bits
, which could be removed with a similar approach (or by doing the check the object bits are used for in a different way entirely the way @danpoe was suggesting).
src/cbmc/cbmc_parse_options.cpp
Outdated
/// Some variables have different initial values based on what flags are being passed to cbmc; | ||
/// since the _start function might've been generated by goto-cc before getting passed over here | ||
/// we need to add the initialisation of these variables here as a post-processing step | ||
static void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this code should be in this class - better move it into a separate file in goto-programs or goto-instrument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
31ffef4
to
33c02db
Compare
@@ -879,6 +880,8 @@ bool cbmc_parse_optionst::process_goto_program( | |||
link_to_library( | |||
goto_model, log.get_message_handler(), cprover_c_library_factory); | |||
|
|||
add_malloc_may_fail_variable_initializations(goto_model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is now part of the correct set up for goto programs can you either A. add this to all such set-up code ( goto-analyzer, goto-diff, goto-instrument at least) or B. add it to one of the existing methods -- I would not oppose it being part of link_to_library
as it is part of the handling of the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely can’t go into link_to_library
and shouldn’t go into any of the instrumentation tools (i.e. it can’t go into link_to_library
because it shouldn’t go into the instrumentation tools). The reason this is put in this location is specifically to avoid instrumentation code setting these variables, they should only be set by cbmc. Arguably jbmc and goto-analyzer too, but definitely not goto-instrument etc.
The thing with goto-analyzer
though is also that because the stdlib models are cbmc models, I don't think they'd even make sense for goto-analyzer (and I don't think they're loaded). Similar story with jbmc, I don't think it loads the C models (that would make more sense like to verify C code interacting with java code I guess, but I don't know if anyone's actually doing that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarise above, I don’t really agree from what I understand of the situation, but maybe I’m missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They shouldn't just be cbmc
models; goto-analyzer
will need them as well. They may not work that well at the moment but let's not make them any more broken than they currently are. In jbmc
I don't know what their models are written in, Java or C. It would be good if it was possible for it to work.
My concern is avoiding divergence in how the tools set up and configure goto programs. This has been a source of subtle bugs in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-cs Again, I could put this code in goto-analyzer but I still don’t quite understand what good it’ll do. The models library as a matter of fact is currently not being used for goto-analyzer (and as far as I know this has been the case for at least 5 years), so if I add this code it’ll naturally go untested. So the problem is that then I’m adding code to goto-analyzer that is vestigial to begin with (i.e. literally a no-op in all circumstances). While I do understand that we want to keep goto transforms consistent between different tools in this particular case I don’t see the value in it. Right now goto-analyzer doesn’t even understand things like __CPROVER_allocate
etc (or rather, I don’t think any of the domains do). For me the problem here is I’m adding code to places that I know does nothing right now, and that I can’t be sure will do the right thing whenever someone changes the surrounding context in such a way that would allow it to do something. Whenever someone in the future would read that goto-analyzer code and asks "what is this bit of the code for", the true answer would be "it does absolutely nothing because the preconditions for it doing anything are not there". If it ever starts doing anything we’ll have no way of knowing it’s doing the right thing because we didn’t (and couldn’t) test it until then.
I think we’re both concerned about maintainability here, but we seem to disagree on which way round is better for maintainability.
@peterschrammel @piotr-grabalski @NlightNFotis @thomasspriggs would you mind chipping in? Very much willing to be overruled here, but I want to be sure we’re doing the right thing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should be called by goto-analyzer too.
IMO, this shouldn't be implemented through the models library at all, but by a pass over the goto model - but that may follow afterwards. This PR is for urgently fixing a major bug in CBMC when used with goto-cc. I.e. priority is getting the bug fixed today. Tomorrow we can look how to improve the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I’d want it to be in the models library is because IMHO whether or not this can return NULL is part of the model, and you should be able to figure out what behaviour the models library has from looking at the library code. Instrumenting logic means distributing the logic between the models library source files and different parts of the C++ code base, but like you said we should discuss how to move forward with this when we’ve actually fixed the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hannes-steffenhagen-diffblue would you be OK adding a comments (and commented out code?) in the other locations saying why it shouldn't / isn't / may be needed? Would be happy to approve that and get it merged in an hour or so when the CI passes?
@peterschrammel I feel the library solution has a number of advantages; should we talk about this at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 There is kind of a design decision to be made, as to whether these checks should be part of the model or to be an instrumentation pass. I liked the idea of this being an instrumentation pass.
⛏️ If we are sticking with them being part of the implementation of the model, then I think the approach outlined in this PR isn't good in the long term either. The reason being that this is separating correctly configuring the model/library from adding it. This leaves things in an invalid state in between the two stages, where it is added but not correctly configured. However I am willing to approve as is because this is the simplest way to keep the existing user interface and fix the issue in a short period of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasspriggs agreed -- we do need to discuss this. Adding the models and configuring them are two different things and yes, we do need to think about it. Do you want to open an issue to discuss this?
a71264b
to
2ffc26f
Compare
We have code in our stdlib models that depends on two global variables, __CPROVER_malloc_may_fail and __CPROVER_malloc_failure_mode. Previously these were set based on the ansi-c config in builtin_additions by generating C code containing definitions for these variables. The problem with that is that if you compile an executable using goto-cc we also load the builtin_additions, so this executable already contains definitions for these variables which can't be overridden. This works around this issue by instead declaring these variables as extern in builtin_additions, and setting their values dynamically as last-minute instrumentation at the end of __CPROVER_initialize. Giving these no/nondet initial values is important because otherwise preprocessing like constant folding on goto-binaries before cbmc actually runs could remove branches we can, in fact, take with different values set by --malloc-fail-* flags. An alternative would be to instead dynamically change the code for the models functions as instrumentation, but this has a couple of problems; For instance this means we might change the behaviour of custom stdlib models (which we allow via --no-library) or custom goto instrumentation in unexpected ways. I believe this way is the least likely to cause any friction like that.
These constant_propagation tests expect specific numbers of assignments. Since we changed around how we initialize the variables for the --malloc-may-fail feature these numbers have changed.
2ffc26f
to
e0eee9d
Compare
Codecov Report
@@ Coverage Diff @@
## develop #5500 +/- ##
========================================
Coverage 68.25% 68.25%
========================================
Files 1180 1181 +1
Lines 97722 97729 +7
========================================
+ Hits 66698 66705 +7
Misses 31024 31024
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
||
initialize_function.body.insert_before( | ||
initialize_function_end, | ||
goto_programt::make_assignment(make_integer_assignment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ I think I would have replaced the existing assignment in cprover_initialise, rather than adding extra assignments. Your approach will make the resulting code in cprover_initialise
more difficult to read, when it is printed for debugging purposes. I think that replacing the existing assignment would also have reduced the need to modify the regression tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasspriggs I don’t understand how having an extra assignment is more difficult to read/debug than adding extra logic to functions at runtime
@@ -879,6 +880,8 @@ bool cbmc_parse_optionst::process_goto_program( | |||
link_to_library( | |||
goto_model, log.get_message_handler(), cprover_c_library_factory); | |||
|
|||
add_malloc_may_fail_variable_initializations(goto_model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 There is kind of a design decision to be made, as to whether these checks should be part of the model or to be an instrumentation pass. I liked the idea of this being an instrumentation pass.
⛏️ If we are sticking with them being part of the implementation of the model, then I think the approach outlined in this PR isn't good in the long term either. The reason being that this is separating correctly configuring the model/library from adding it. This leaves things in an invalid state in between the two stages, where it is added but not correctly configured. However I am willing to approve as is because this is the simplest way to keep the existing user interface and fix the issue in a short period of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
If possible, a comment in goto-instrument
saying "Don't do this" and something for goto-diff
( @peterschrammel : what is goto-diff
's status) would be great but those could be in another PR if you need to get this moving ASAP.
@@ -726,6 +726,10 @@ int jbmc_parse_optionst::get_goto_program( | |||
|
|||
goto_modelt &goto_model = dynamic_cast<goto_modelt &>(*goto_model_ptr); | |||
|
|||
// if we added the ansi-c library models here this should also call | |||
// add_malloc_may_fail_variable_initializations(goto_model); | |||
// here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 I am not sure this comment is helpful. JBMC uses its own models rather than the ansi-c models. Also the java language uses new
, not malloc. This is parsed from the class file into ID_java_new
/ ID_java_new_array
, which is then lowered directly into ID_allocate
. This bypasses the use of a malloc function entirely. See jbmc/src/java_bytecode/remove_java_new.cpp
for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasspriggs there are valid reasons why you might want to add the ansi-c library here; e.g. if there was JNI support and you wanted to link your java models to C models. Not a thing as of now as far as I know, but it's not inconceivable that his would be a concern in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that make a little more sense, if you were thinking about JNI support. AFAIK JNI support isn't planned for the near future. I also suspect that it would be tricky to implement given the complexities of lazy loading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that’s why I didn’t want to put it here in the first place! But if you wanted to load the C library, you’d have to add this code and the flags as well, which I believe was @martin-cs's concern and why he asked for these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasspriggs : Yes, as @hannes-steffenhagen-diffblue observed JNI is one thing. Another is that there are a lot of implicit assumptions in the modelling and instrumentation that the C library is present. It is likely that future modelling of more sophisticated things may well make use of / depend on the C library and I would hate to see the intermediate representation diverge because of this.
@hannes-steffenhagen-diffblue : yes, my concern is in divergence of how we handle things and the "well it works if you use X but not if you use Y first, then X". This way the divergence is clear and documented in the relevant place (ideally it would be in goto-instrument
and goto-diff
too).
Also add some explanation for why it’s commented out
e14aa7c
to
aea6b8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding my approval to the others. Though I agree it looks like some broader/longer term discussion is also needed.
We have code in our stdlib models that depends on two global variables,
__CPROVER_malloc_may_fail and __CPROVER_malloc_failure_mode. Previously
these were set based on the ansi-c config in builtin_additions by
generating C code containing definitions for these variables.
The problem with that is that if you compile an executable using goto-cc
we also load the builtin_additions, so this executable already contains
definitions for these variables which can't be overridden.
This works around this issue by instead declaring these variables as
extern in builtin_additions, and setting their values dynamically as
last-minute instrumentation at the end of __CPROVER_initialize. Giving
these no/nondet initial values is important because otherwise
preprocessing like constant folding on goto-binaries before cbmc
actually runs could remove branches we can, in fact, take with different
values set by --malloc-fail-* flags.
An alternative would be to instead dynamically change the code for the
models functions as instrumentation, but this has a couple of problems;
For instance this means we might change the behaviour of custom stdlib
models (which we allow via --no-library) or custom goto instrumentation
in unexpected ways. I believe this way is the least likely to cause any
friction like that.