Skip to content

Fix bug that prevented --malloc-may-fail from working #5500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hannes-steffenhagen-diffblue
Copy link
Contributor

We have code in our stdlib models that depends on two global variables,
__CPROVER_malloc_may_fail and __CPROVER_malloc_failure_mode. Previously
these were set based on the ansi-c config in builtin_additions by
generating C code containing definitions for these variables.

The problem with that is that if you compile an executable using goto-cc
we also load the builtin_additions, so this executable already contains
definitions for these variables which can't be overridden.

This works around this issue by instead declaring these variables as
extern in builtin_additions, and setting their values dynamically as
last-minute instrumentation at the end of __CPROVER_initialize. Giving
these no/nondet initial values is important because otherwise
preprocessing like constant folding on goto-binaries before cbmc
actually runs could remove branches we can, in fact, take with different
values set by --malloc-fail-* flags.

An alternative would be to instead dynamically change the code for the
models functions as instrumentation, but this has a couple of problems;
For instance this means we might change the behaviour of custom stdlib
models (which we allow via --no-library) or custom goto instrumentation
in unexpected ways. I believe this way is the least likely to cause any
friction like that.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@@ -186,17 +186,15 @@ void ansi_c_internal_additions(std::string &code)
CPROVER_PREFIX "size_t size, " CPROVER_PREFIX "bool zero);\n"
"const void *" CPROVER_PREFIX "alloca_object = 0;\n"

"int " CPROVER_PREFIX "malloc_failure_mode="+
std::to_string(config.ansi_c.malloc_failure_mode)+";\n"
"extern int " CPROVER_PREFIX "malloc_failure_mode;\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the malloc-X options be removed from goto-cc now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterschrammel we never added them to goto-cc. What we did add to goto-cc was --object-bits, which could be removed with a similar approach (or by doing the check the object bits are used for in a different way entirely the way @danpoe was suggesting).

/// Some variables have different initial values based on what flags are being passed to cbmc;
/// since the _start function might've been generated by goto-cc before getting passed over here
/// we need to add the initialisation of these variables here as a post-processing step
static void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this code should be in this class - better move it into a separate file in goto-programs or goto-instrument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -879,6 +880,8 @@ bool cbmc_parse_optionst::process_goto_program(
link_to_library(
goto_model, log.get_message_handler(), cprover_c_library_factory);

add_malloc_may_fail_variable_initializations(goto_model);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is now part of the correct set up for goto programs can you either A. add this to all such set-up code ( goto-analyzer, goto-diff, goto-instrument at least) or B. add it to one of the existing methods -- I would not oppose it being part of link_to_library as it is part of the handling of the library.

Copy link
Contributor Author

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely can’t go into link_to_library and shouldn’t go into any of the instrumentation tools (i.e. it can’t go into link_to_library because it shouldn’t go into the instrumentation tools). The reason this is put in this location is specifically to avoid instrumentation code setting these variables, they should only be set by cbmc. Arguably jbmc and goto-analyzer too, but definitely not goto-instrument etc.

The thing with goto-analyzer though is also that because the stdlib models are cbmc models, I don't think they'd even make sense for goto-analyzer (and I don't think they're loaded). Similar story with jbmc, I don't think it loads the C models (that would make more sense like to verify C code interacting with java code I guess, but I don't know if anyone's actually doing that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarise above, I don’t really agree from what I understand of the situation, but maybe I’m missing something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They shouldn't just be cbmc models; goto-analyzer will need them as well. They may not work that well at the moment but let's not make them any more broken than they currently are. In jbmc I don't know what their models are written in, Java or C. It would be good if it was possible for it to work.

My concern is avoiding divergence in how the tools set up and configure goto programs. This has been a source of subtle bugs in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-cs Again, I could put this code in goto-analyzer but I still don’t quite understand what good it’ll do. The models library as a matter of fact is currently not being used for goto-analyzer (and as far as I know this has been the case for at least 5 years), so if I add this code it’ll naturally go untested. So the problem is that then I’m adding code to goto-analyzer that is vestigial to begin with (i.e. literally a no-op in all circumstances). While I do understand that we want to keep goto transforms consistent between different tools in this particular case I don’t see the value in it. Right now goto-analyzer doesn’t even understand things like __CPROVER_allocate etc (or rather, I don’t think any of the domains do). For me the problem here is I’m adding code to places that I know does nothing right now, and that I can’t be sure will do the right thing whenever someone changes the surrounding context in such a way that would allow it to do something. Whenever someone in the future would read that goto-analyzer code and asks "what is this bit of the code for", the true answer would be "it does absolutely nothing because the preconditions for it doing anything are not there". If it ever starts doing anything we’ll have no way of knowing it’s doing the right thing because we didn’t (and couldn’t) test it until then.

I think we’re both concerned about maintainability here, but we seem to disagree on which way round is better for maintainability.

@peterschrammel @piotr-grabalski @NlightNFotis @thomasspriggs would you mind chipping in? Very much willing to be overruled here, but I want to be sure we’re doing the right thing here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it should be called by goto-analyzer too.
IMO, this shouldn't be implemented through the models library at all, but by a pass over the goto model - but that may follow afterwards. This PR is for urgently fixing a major bug in CBMC when used with goto-cc. I.e. priority is getting the bug fixed today. Tomorrow we can look how to improve the solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I’d want it to be in the models library is because IMHO whether or not this can return NULL is part of the model, and you should be able to figure out what behaviour the models library has from looking at the library code. Instrumenting logic means distributing the logic between the models library source files and different parts of the C++ code base, but like you said we should discuss how to move forward with this when we’ve actually fixed the problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannes-steffenhagen-diffblue would you be OK adding a comments (and commented out code?) in the other locations saying why it shouldn't / isn't / may be needed? Would be happy to approve that and get it merged in an hour or so when the CI passes?

@peterschrammel I feel the library solution has a number of advantages; should we talk about this at some point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 There is kind of a design decision to be made, as to whether these checks should be part of the model or to be an instrumentation pass. I liked the idea of this being an instrumentation pass.

⛏️ If we are sticking with them being part of the implementation of the model, then I think the approach outlined in this PR isn't good in the long term either. The reason being that this is separating correctly configuring the model/library from adding it. This leaves things in an invalid state in between the two stages, where it is added but not correctly configured. However I am willing to approve as is because this is the simplest way to keep the existing user interface and fix the issue in a short period of time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasspriggs agreed -- we do need to discuss this. Adding the models and configuring them are two different things and yes, we do need to think about it. Do you want to open an issue to discuss this?

We have code in our stdlib models that depends on two global variables,
__CPROVER_malloc_may_fail and __CPROVER_malloc_failure_mode. Previously
these were set based on the ansi-c config in builtin_additions by
generating C code containing definitions for these variables.

The problem with that is that if you compile an executable using goto-cc
we also load the builtin_additions, so this executable already contains
definitions for these variables which can't be overridden.

This works around this issue by instead declaring these variables as
extern in builtin_additions, and setting their values dynamically as
last-minute instrumentation at the end of __CPROVER_initialize.  Giving
these no/nondet initial values is important because otherwise
preprocessing like constant folding on goto-binaries before cbmc
actually runs could remove branches we can, in fact, take with different
values set by --malloc-fail-* flags.

An alternative would be to instead dynamically change the code for the
models functions as instrumentation, but this has a couple of problems;
For instance this means we might change the behaviour of custom stdlib
models (which we allow via --no-library) or custom goto instrumentation
in unexpected ways. I believe this way is the least likely to cause any
friction like that.
These constant_propagation tests expect specific numbers of assignments.
Since we changed around how we initialize the variables for the
--malloc-may-fail feature these numbers have changed.
@codecov
Copy link

codecov bot commented Sep 21, 2020

Codecov Report

Merging #5500 into develop will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #5500   +/-   ##
========================================
  Coverage    68.25%   68.25%           
========================================
  Files         1180     1181    +1     
  Lines        97722    97729    +7     
========================================
+ Hits         66698    66705    +7     
  Misses       31024    31024           
Flag Coverage Δ
#cproversmt2 42.82% <100.00%> (+<0.01%) ⬆️
#regression 65.42% <100.00%> (+<0.01%) ⬆️
#unit 32.23% <36.36%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jbmc/src/jbmc/jbmc_parse_options.cpp 73.73% <ø> (ø)
src/ansi-c/ansi_c_internal_additions.cpp 94.02% <ø> (-0.34%) ⬇️
src/goto-analyzer/goto_analyzer_parse_options.cpp 69.78% <ø> (ø)
src/cbmc/cbmc_parse_options.cpp 77.15% <100.00%> (+0.05%) ⬆️
...s/add_malloc_may_fail_variable_initializations.cpp 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6feed2...aea6b8e. Read the comment docs.


initialize_function.body.insert_before(
initialize_function_end,
goto_programt::make_assignment(make_integer_assignment(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I think I would have replaced the existing assignment in cprover_initialise, rather than adding extra assignments. Your approach will make the resulting code in cprover_initialise more difficult to read, when it is printed for debugging purposes. I think that replacing the existing assignment would also have reduced the need to modify the regression tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasspriggs I don’t understand how having an extra assignment is more difficult to read/debug than adding extra logic to functions at runtime

@@ -879,6 +880,8 @@ bool cbmc_parse_optionst::process_goto_program(
link_to_library(
goto_model, log.get_message_handler(), cprover_c_library_factory);

add_malloc_may_fail_variable_initializations(goto_model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 There is kind of a design decision to be made, as to whether these checks should be part of the model or to be an instrumentation pass. I liked the idea of this being an instrumentation pass.

⛏️ If we are sticking with them being part of the implementation of the model, then I think the approach outlined in this PR isn't good in the long term either. The reason being that this is separating correctly configuring the model/library from adding it. This leaves things in an invalid state in between the two stages, where it is added but not correctly configured. However I am willing to approve as is because this is the simplest way to keep the existing user interface and fix the issue in a short period of time.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

If possible, a comment in goto-instrument saying "Don't do this" and something for goto-diff ( @peterschrammel : what is goto-diff's status) would be great but those could be in another PR if you need to get this moving ASAP.

@@ -726,6 +726,10 @@ int jbmc_parse_optionst::get_goto_program(

goto_modelt &goto_model = dynamic_cast<goto_modelt &>(*goto_model_ptr);

// if we added the ansi-c library models here this should also call
// add_malloc_may_fail_variable_initializations(goto_model);
// here
Copy link
Contributor

@thomasspriggs thomasspriggs Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 I am not sure this comment is helpful. JBMC uses its own models rather than the ansi-c models. Also the java language uses new, not malloc. This is parsed from the class file into ID_java_new / ID_java_new_array, which is then lowered directly into ID_allocate. This bypasses the use of a malloc function entirely. See jbmc/src/java_bytecode/remove_java_new.cpp for details.

Copy link
Contributor Author

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasspriggs there are valid reasons why you might want to add the ansi-c library here; e.g. if there was JNI support and you wanted to link your java models to C models. Not a thing as of now as far as I know, but it's not inconceivable that his would be a concern in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that make a little more sense, if you were thinking about JNI support. AFAIK JNI support isn't planned for the near future. I also suspect that it would be tricky to implement given the complexities of lazy loading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that’s why I didn’t want to put it here in the first place! But if you wanted to load the C library, you’d have to add this code and the flags as well, which I believe was @martin-cs's concern and why he asked for these comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasspriggs : Yes, as @hannes-steffenhagen-diffblue observed JNI is one thing. Another is that there are a lot of implicit assumptions in the modelling and instrumentation that the C library is present. It is likely that future modelling of more sophisticated things may well make use of / depend on the C library and I would hate to see the intermediate representation diverge because of this.

@hannes-steffenhagen-diffblue : yes, my concern is in divergence of how we handle things and the "well it works if you use X but not if you use Y first, then X". This way the divergence is clear and documented in the relevant place (ideally it would be in goto-instrument and goto-diff too).

Also add some explanation for why it’s commented out
Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding my approval to the others. Though I agree it looks like some broader/longer term discussion is also needed.

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue merged commit 2c7decc into diffblue:develop Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

goto-cc incompatible with cbmc's --malloc-may-fail and --malloc-fail-null
5 participants