-
Notifications
You must be signed in to change notification settings - Fork 274
Add function harness generator and generator factory #4014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ecbacc
to
1f82265
Compare
1f82265
to
b3a0a62
Compare
b3a0a62
to
04340d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending clarification w.r.t. goto_model
modification placement it looks good.
3cd08e3
to
aa84827
Compare
Ok, now I'm happy. |
2145f40
to
6acb67e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 6acb67e).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/99443610
85e9529
to
edb34f6
Compare
"harness function already in the symbol table " + | ||
id2string(harness_function_name), | ||
"--" GOTO_HARNESS_GENERATOR_HARNESS_FUNCTION_NAME_OPT}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above two checks (function_found and harness_function_found) look like the kind of checks that most harness generators will want to do... I appreciate that these checks can't be done before the goto_model has been constructed, so this is probably the right location - but in the interest of reducing code duplication, could they be lifted out into some sort of helper functions/members that could then be shared between harness generators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisr-diffblue harness function found yes, function found not necessarily. For example, if I understand @danpoe correctly his harness will actually be able to resume a program "mid function" and will presumably require synthesising and entirely new function to call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, not every harness generator will want to do every check - but seems like having helper functions, one for each check, would at least avoid the code duplication in the cases where multiple generators want the same type of check.
|
||
/// Function harness generator that for a specified goto-function | ||
/// generates a harness that sets up its arguments and calls it. | ||
class function_harness_generatort : public goto_harness_generatort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure this is the best name for this harness generator. I know it's just an 'example' generator, but its name doesn't really make it clear what sort of harness it generates (given that pretty much all harness generators are going to be generating functions...). Maybe something like function_call_harness_generatort
might be a bit better, or trampoline_harness_generatort
maybe. Also, I think the documentation comment needs to be a bit clearer and more explicit - e.g. "generates a harness that will call the specified goto-function (the function under test), setting it's arguments to ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisr-diffblue I wouldn't call it an 'example' generator, this is what we intend to to put on the other functionality like setting up structs, arrays etc on. The example-yness of it only comes from us not doing all that work as part of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thats reasonable. Probably still needs the doc comment clarified though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why a factory is most appropriate design pattern here, but that's not a big deal and overall this looks ok to me.
GOTO_HARNESS_FACTORY_OPTIONS \ | ||
FUNCTION_HARNESS_GENERATOR_OPTIONS \ | ||
// end GOTO_HARNESS_OPTIONS | ||
// clang-format on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an unknown reason clang-format
seems to ignore the clang-format off
here (and also in the other parse_options.h file). Please experiment locally, maybe you need an extra blank line somewhere.
src/goto-harness/Makefile
Outdated
@@ -1,10 +1,16 @@ | |||
SRC = \ | |||
goto_harness_main.cpp \ | |||
goto_harness_parse_options.cpp \ | |||
goto_harness_generator_factory.cpp \ | |||
function_harness_generator.cpp \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer lexicographic ordering.
#include <util/std_expr.h> | ||
#include <util/ui_message.h> | ||
|
||
#include "function_harness_generator.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be the first include (cf. CODING_STANDARD.md
).
|
||
function_harness_generatort::function_harness_generatort( | ||
ui_message_handlert &message_handler) | ||
: goto_harness_generatort{}, pImpl(util_make_unique<implt>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No CaMeLcaSE please.
{ | ||
public: | ||
explicit function_harness_generatort(ui_message_handlert &message_handler); | ||
~function_harness_generatort() override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is required because we use the pimpl idiom here; default_delete
requires its type parameter to be a complete type, so we can't instantiate the default destructor of unique_ptr
here; Instead we do this in the makefile, where the full implt
definition is visible.
goto_harness_parse_optionst::goto_harness_configt | ||
goto_harness_parse_optionst::handle_common_options() | ||
{ | ||
goto_harness_configt config{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above changes this will shadow the global config
object, please rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid shadowing the global config
object.
"required option not set", | ||
"--" GOTO_HARNESS_GENERATOR_HARNESS_FUNCTION_NAME_OPT}; | ||
} | ||
config.harness_function_name = irep_idt{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need the irep_idt
here.
|
||
class goto_harness_parse_optionst : public parse_options_baset | ||
class goto_harness_parse_optionst : public parse_options_baset, public messaget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If avoidable, I'd ask to use a class member instead of inheriting from messaget
. (I know that's done elsewhere, but it's poor design. The relationship is "has a logger", not "is a logger.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, nice design of the command line handling
/// doesn't apply to this generator. Should only set options rather than | ||
/// immediately performing work | ||
virtual void | ||
handle_option(const std::string &option, const std::string &value) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a method finalise_option_handling()
that is invoked after all the handle_option()
calls? I'm thinking of the case when there are required options or dependencies between options (such as if --a
is present --b
needs to be present as well). Then those checks could be done there instead of in generate()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to do that yes, we'll have to think about it
// 2. Initialise the harness generator (with some config) that will handle | ||
// the mutation of the goto-model. | ||
// 3. Write the end result of that process to the output binary | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a brief description of the harness generation here? In particular, to say that a new function is created that represents the harness, and that one needs to make it the entry point (via cbmc --function
) in order to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; Arguably this information should be added to the user manual as well
{ | ||
if(common_options.find(option) == common_options.end()) | ||
{ | ||
factory_options.insert({option, cmdline.get_value(option.c_str())}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good if we could also support repeated options and catch the error case when an option was given multiple times that should only be given once.
Maybe we could introduce an overloaded method handle_option()
that takes a list of values (and only that one is called by the framework). In goto_harness_generatort
we could provide a default implementation, something like that:
void handle_option(
const std::string &option,
const std::list<std::string> &values)
{
if(values.size() != 1)
{
throw invalid_command_line_argument_exceptiont(...);
}
handle_option(option, values.front());
}
Then in the normal use case one still has to implement only the handle_option()
method that takes a string as the second argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I misunderstood how get_value()
works with repeated arguments :(
Have to reconsider this part of the API as well 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're doing something somewhat similar to what you suggested now; handle_option
is now always taking a list of values. Instead of an overload like you suggested we've added helper functions require_exactly_one_value
, which returns the only value of a single value list or throws if it is not given a single value list, and assert_no_values
which asserts that it is given an empty list (assert because an option which doesn't have values can't get values as per the cmdline API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 2b2893d).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/99625474
2b2893d
to
bd38fc4
Compare
@danpoe @tautschnig We have now implemented all of the changes requested. They are small changes in each commit for easier review. Once you give it the greenlight, we are going to squash them into one commit and merge them. |
The changes look good to me. |
regression/goto-harness/chain.sh
Outdated
script_dir=$(dirname $0) | ||
|
||
$goto_harness "${name}.gb" "${name}-mod.gb" --harness-function-name $entry_point ${args} | ||
$cbmc $script_dir/dummy.c --function $entry_point "${name}-mod.gb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the role of dummy.c
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually addressing a problem we had where without it, the language mode is not set to C
automatically in cbmc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does --c99
or equivalent not cover that? Though this sounds like the output from goto-harness might not be including everything expected in the goto-binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, regression/goto-cc-cbmc/chain.sh
suggests that just doing a goto-cc
, followed by a cbmc
on the resulting goto-binary should work. Does it? And if it does, why does putting goto-harness in the middle of that pipeline then cause it to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisr-diffblue just that much works. What doesn't work with it is specifying an entry point via --function
, which goto-instrument
doesn't need to do. Unfortunately afaict setting something like --c99
doesn't work either; I can't remember the exact spot in the source at the moment, but we investigated this and cbmc considers only the names of input source files when deciding what language to use for the support function.
EDIT: To clarify, this is not specific to goto-harness
. The same problem appears with unmodified binaries produced by goto-cc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that still the case, given that the config
is now set from the symbol table? If it is, then there is some other bug to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should happen is that any tool trying to generate the entry function looks at the mode
of the symbol specified via --function
. @hannes-steffenhagen-diffblue can you point at the code that you figured CBMC uses to decide the language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tautschnig Sorry for taking so long to get back to you on this, I've added a PR that explains and fixes the issue here: #4104
|
||
std::string | ||
goto_harness_generatort::require_exactly_one_value(const std::string &option, | ||
const std::list<std::string> &values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting here looks a bit odd.
} | ||
|
||
void goto_harness_generatort::assert_no_values(const std::string &option, | ||
const std::list<std::string> &values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd formatting.
|
||
// TODO: What does this actually do? Do we need to configure the command | ||
// line options for this to work? | ||
config.set(cmdline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just sets up the defaults (and would interpret options such as --32
). I'm not sure that's really required here, but it's probably a good idea to have a config
in a sane state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a problem with binaries generated by goto_cc
. The problem is that while we are trying to call config.set_from_symbol_table(goto_model.symbol_table);
from a binary that was compiled with goto-cc
, we hit an UNREACHABLE
assertion in configt::ansi_ct::set_arch_spec_x86_64()
. It has flavourt::NONE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes, so you need to set the platform-specific defaults via this call.
goto_harness_parse_optionst::goto_harness_configt | ||
goto_harness_parse_optionst::handle_common_options() | ||
{ | ||
goto_harness_configt config{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid shadowing the global config
object.
3ab55ec
to
96d51d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫
This PR failed Diffblue compatibility checks (cbmc commit: 96d51d8).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/99677408
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.
Common spurious failures:
- the cbmc commit has disappeared in the mean time (e.g. in a force-push)
- the author is not in the list of contributors (e.g. first-time contributors).
The incompatibility may have been introduced by an earlier PR. In that case merging this
PR should be avoided unless it fixes the current incompatibility.
Previously, when we compiled for instance a C file with no `main()` with goto-cc it would have no __CPROVER__start. CBMC would use the following logic to determine the entry point: 1. If `--function` is not passed, use existing `__CPROVER__start` -- in our case, we don't have an existing `__CPROVER__start`, so this doesn't work for us 2. If `--function` _is_ passed, and a `__CPROVER__start` exists: Generate a new entry point calling the passed entry function, based on the mode of the existing `__CPROVER__start` (I am not sure why this mode rather than the mode of `--function`) 3. Otherwise, iterate over the non-binary input files, determine the language to use based on The file extension for each source file, and try to generate an appropriate `__CPROVER__start` based on that The situation of having a single goto-binary with no `__CPROVER__start` as the input file doesn't fit any of these cases, so regardless of whether `--function` is passed CBMC is going to fail, complaining that it can't find or generate an entry point. This adds a fourth case to the list: 4. If `--function` is passed and there is no `__CPROVER__start`, select the language based on the mode of the function parameter and use it to generate `__CPROVER__start` This commit doesn't add tests for this change, but diffblue#4014 adds tests that, in the absence of this change, require hack-y workarounds (such as passing an empty file with a `.c` extension to force C mode for case 3 in the list above) to work.
Previously, when we compiled for instance a C file with no `main()` with goto-cc it would have no __CPROVER__start. CBMC would use the following logic to determine the entry point: 1. If `--function` is not passed, use existing `__CPROVER__start` -- in our case, we don't have an existing `__CPROVER__start`, so this doesn't work for us 2. If `--function` _is_ passed, and a `__CPROVER__start` exists: Generate a new entry point calling the passed entry function, based on the mode of the existing `__CPROVER__start` (I am not sure why this mode rather than the mode of `--function`) 3. Otherwise, iterate over the non-binary input files, determine the language to use based on The file extension for each source file, and try to generate an appropriate `__CPROVER__start` based on that The situation of having a single goto-binary with no `__CPROVER__start` as the input file doesn't fit any of these cases, so regardless of whether `--function` is passed CBMC is going to fail, complaining that it can't find or generate an entry point. This adds a fourth case to the list: 4. If `--function` is passed and there is no `__CPROVER__start`, select the language based on the mode of the function parameter and use it to generate `__CPROVER__start` This commit doesn't add tests for this change, but diffblue#4014 adds tests that, in the absence of this change, require hack-y workarounds (such as passing an empty file with a `.c` extension to force C mode for case 3 in the list above) to work.
Previously, when we compiled for instance a C file with no `main()` with goto-cc it would have no __CPROVER__start. CBMC would use the following logic to determine the entry point: 1. If `--function` is not passed, use existing `__CPROVER__start` -- in our case, we don't have an existing `__CPROVER__start`, so this doesn't work for us 2. If `--function` _is_ passed, and a `__CPROVER__start` exists: Generate a new entry point calling the passed entry function, based on the mode of the existing `__CPROVER__start` (I am not sure why this mode rather than the mode of `--function`) 3. Otherwise, iterate over the non-binary input files, determine the language to use based on The file extension for each source file, and try to generate an appropriate `__CPROVER__start` based on that The situation of having a single goto-binary with no `__CPROVER__start` as the input file doesn't fit any of these cases, so regardless of whether `--function` is passed CBMC is going to fail, complaining that it can't find or generate an entry point. This adds a fourth case to the list: 4. If `--function` is passed and there is no `__CPROVER__start`, select the language based on the mode of the function parameter and use it to generate `__CPROVER__start` This commit doesn't add tests for this change, but diffblue#4014 adds tests that, in the absence of this change, require hack-y workarounds (such as passing an empty file with a `.c` extension to force C mode for case 3 in the list above) to work.
@tautschnig |
Previously, when we compiled for instance a C file with no `main()` with goto-cc it would have no __CPROVER__start. CBMC would use the following logic to determine the entry point: 1. If `--function` is not passed, use existing `__CPROVER__start` -- in our case, we don't have an existing `__CPROVER__start`, so this doesn't work for us 2. If `--function` _is_ passed, and a `__CPROVER__start` exists: Generate a new entry point calling the passed entry function, based on the mode of the existing `__CPROVER__start` (I am not sure why this mode rather than the mode of `--function`) 3. Otherwise, iterate over the non-binary input files, determine the language to use based on The file extension for each source file, and try to generate an appropriate `__CPROVER__start` based on that The situation of having a single goto-binary with no `__CPROVER__start` as the input file doesn't fit any of these cases, so regardless of whether `--function` is passed CBMC is going to fail, complaining that it can't find or generate an entry point. This adds a fourth case to the list: 4. If `--function` is passed and there is no `__CPROVER__start`, select the language based on the mode of the function parameter and use it to generate `__CPROVER__start` This commit doesn't add tests for this change, but diffblue#4014 adds tests that, in the absence of this change, require hack-y workarounds (such as passing an empty file with a `.c` extension to force C mode for case 3 in the list above) to work.
This adds a framework for adding various new harness generator types and adds tests. It also adds a harness generator that calls a given goto function. Co-authored-by: Hannes Steffenhagen <[email protected]> Co-authored-by: Fotis Koutoulakis <[email protected]>
96d51d8
to
7b55287
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 7b55287).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100129308
This adds a framework for adding various new harness generator types and adds tests. It also adds a harness generator that calls a given goto function.
This is WIP, as it contains some changes that we are waiting to be merged (outstanding one is #3997). Once it gets merged, we can rebase on top of develop and then the changes on
util/
will be gone.This is work that we did jointly @hannes-steffenhagen-diffblue as part of the work on the goto-harness feature.