Skip to content

Add function harness generator and generator factory #4014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 8, 2019

Conversation

NlightNFotis
Copy link
Contributor

This adds a framework for adding various new harness generator types and adds tests. It also adds a harness generator that calls a given goto function.

This is WIP, as it contains some changes that we are waiting to be merged (outstanding one is #3997). Once it gets merged, we can rebase on top of develop and then the changes on util/ will be gone.

This is work that we did jointly @hannes-steffenhagen-diffblue as part of the work on the goto-harness feature.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link
Contributor

@xbauch xbauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending clarification w.r.t. goto_model modification placement it looks good.

@xbauch xbauch removed their assignment Feb 1, 2019
@NlightNFotis NlightNFotis force-pushed the new_chain_sh branch 3 times, most recently from 3cd08e3 to aa84827 Compare February 1, 2019 14:49
@NlightNFotis NlightNFotis changed the title Add function harness generator and generator factory [depends-on: #3997] Add function harness generator and generator factory Feb 1, 2019
@xbauch
Copy link
Contributor

xbauch commented Feb 1, 2019

Ok, now I'm happy.

@NlightNFotis NlightNFotis force-pushed the new_chain_sh branch 3 times, most recently from 2145f40 to 6acb67e Compare February 1, 2019 15:20
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 6acb67e).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/99443610

@NlightNFotis NlightNFotis force-pushed the new_chain_sh branch 2 times, most recently from 85e9529 to edb34f6 Compare February 1, 2019 17:14
"harness function already in the symbol table " +
id2string(harness_function_name),
"--" GOTO_HARNESS_GENERATOR_HARNESS_FUNCTION_NAME_OPT};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above two checks (function_found and harness_function_found) look like the kind of checks that most harness generators will want to do... I appreciate that these checks can't be done before the goto_model has been constructed, so this is probably the right location - but in the interest of reducing code duplication, could they be lifted out into some sort of helper functions/members that could then be shared between harness generators?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisr-diffblue harness function found yes, function found not necessarily. For example, if I understand @danpoe correctly his harness will actually be able to resume a program "mid function" and will presumably require synthesising and entirely new function to call

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, not every harness generator will want to do every check - but seems like having helper functions, one for each check, would at least avoid the code duplication in the cases where multiple generators want the same type of check.


/// Function harness generator that for a specified goto-function
/// generates a harness that sets up its arguments and calls it.
class function_harness_generatort : public goto_harness_generatort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure this is the best name for this harness generator. I know it's just an 'example' generator, but its name doesn't really make it clear what sort of harness it generates (given that pretty much all harness generators are going to be generating functions...). Maybe something like function_call_harness_generatort might be a bit better, or trampoline_harness_generatort maybe. Also, I think the documentation comment needs to be a bit clearer and more explicit - e.g. "generates a harness that will call the specified goto-function (the function under test), setting it's arguments to ..."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisr-diffblue I wouldn't call it an 'example' generator, this is what we intend to to put on the other functionality like setting up structs, arrays etc on. The example-yness of it only comes from us not doing all that work as part of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thats reasonable. Probably still needs the doc comment clarified though.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why a factory is most appropriate design pattern here, but that's not a big deal and overall this looks ok to me.

GOTO_HARNESS_FACTORY_OPTIONS \
FUNCTION_HARNESS_GENERATOR_OPTIONS \
// end GOTO_HARNESS_OPTIONS
// clang-format on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an unknown reason clang-format seems to ignore the clang-format off here (and also in the other parse_options.h file). Please experiment locally, maybe you need an extra blank line somewhere.

@@ -1,10 +1,16 @@
SRC = \
goto_harness_main.cpp \
goto_harness_parse_options.cpp \
goto_harness_generator_factory.cpp \
function_harness_generator.cpp \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer lexicographic ordering.

#include <util/std_expr.h>
#include <util/ui_message.h>

#include "function_harness_generator.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be the first include (cf. CODING_STANDARD.md).


function_harness_generatort::function_harness_generatort(
ui_message_handlert &message_handler)
: goto_harness_generatort{}, pImpl(util_make_unique<implt>())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No CaMeLcaSE please.

{
public:
explicit function_harness_generatort(ui_message_handlert &message_handler);
~function_harness_generatort() override;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unnecessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is required because we use the pimpl idiom here; default_delete requires its type parameter to be a complete type, so we can't instantiate the default destructor of unique_ptr here; Instead we do this in the makefile, where the full implt definition is visible.

goto_harness_parse_optionst::goto_harness_configt
goto_harness_parse_optionst::handle_common_options()
{
goto_harness_configt config{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above changes this will shadow the global config object, please rename.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid shadowing the global config object.

"required option not set",
"--" GOTO_HARNESS_GENERATOR_HARNESS_FUNCTION_NAME_OPT};
}
config.harness_function_name = irep_idt{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need the irep_idt here.


class goto_harness_parse_optionst : public parse_options_baset
class goto_harness_parse_optionst : public parse_options_baset, public messaget
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If avoidable, I'd ask to use a class member instead of inheriting from messaget. (I know that's done elsewhere, but it's poor design. The relationship is "has a logger", not "is a logger.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 sounds good to me

Copy link
Contributor

@danpoe danpoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, nice design of the command line handling

/// doesn't apply to this generator. Should only set options rather than
/// immediately performing work
virtual void
handle_option(const std::string &option, const std::string &value) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a method finalise_option_handling() that is invoked after all the handle_option() calls? I'm thinking of the case when there are required options or dependencies between options (such as if --a is present --b needs to be present as well). Then those checks could be done there instead of in generate().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to do that yes, we'll have to think about it

// 2. Initialise the harness generator (with some config) that will handle
// the mutation of the goto-model.
// 3. Write the end result of that process to the output binary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a brief description of the harness generation here? In particular, to say that a new function is created that represents the harness, and that one needs to make it the entry point (via cbmc --function) in order to use it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; Arguably this information should be added to the user manual as well

{
if(common_options.find(option) == common_options.end())
{
factory_options.insert({option, cmdline.get_value(option.c_str())});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good if we could also support repeated options and catch the error case when an option was given multiple times that should only be given once.

Maybe we could introduce an overloaded method handle_option() that takes a list of values (and only that one is called by the framework). In goto_harness_generatort we could provide a default implementation, something like that:

void handle_option(
  const std::string &option,
  const std::list<std::string> &values)
{
  if(values.size() != 1)
  {
    throw invalid_command_line_argument_exceptiont(...);
  }

  handle_option(option, values.front());
}

Then in the normal use case one still has to implement only the handle_option() method that takes a string as the second argument.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I misunderstood how get_value() works with repeated arguments :(

Have to reconsider this part of the API as well 👍

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're doing something somewhat similar to what you suggested now; handle_option is now always taking a list of values. Instead of an overload like you suggested we've added helper functions require_exactly_one_value, which returns the only value of a single value list or throws if it is not given a single value list, and assert_no_values which asserts that it is given an empty list (assert because an option which doesn't have values can't get values as per the cmdline API).

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 2b2893d).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/99625474

@NlightNFotis
Copy link
Contributor Author

NlightNFotis commented Feb 4, 2019

@danpoe @tautschnig We have now implemented all of the changes requested. They are small changes in each commit for easier review. Once you give it the greenlight, we are going to squash them into one commit and merge them.

@danpoe
Copy link
Contributor

danpoe commented Feb 4, 2019

The changes look good to me.

script_dir=$(dirname $0)

$goto_harness "${name}.gb" "${name}-mod.gb" --harness-function-name $entry_point ${args}
$cbmc $script_dir/dummy.c --function $entry_point "${name}-mod.gb"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the role of dummy.c?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually addressing a problem we had where without it, the language mode is not set to C automatically in cbmc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does --c99 or equivalent not cover that? Though this sounds like the output from goto-harness might not be including everything expected in the goto-binary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, regression/goto-cc-cbmc/chain.sh suggests that just doing a goto-cc, followed by a cbmc on the resulting goto-binary should work. Does it? And if it does, why does putting goto-harness in the middle of that pipeline then cause it to fail?

Copy link
Contributor

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisr-diffblue just that much works. What doesn't work with it is specifying an entry point via --function, which goto-instrument doesn't need to do. Unfortunately afaict setting something like --c99 doesn't work either; I can't remember the exact spot in the source at the moment, but we investigated this and cbmc considers only the names of input source files when deciding what language to use for the support function.

EDIT: To clarify, this is not specific to goto-harness. The same problem appears with unmodified binaries produced by goto-cc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that still the case, given that the config is now set from the symbol table? If it is, then there is some other bug to be fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen is that any tool trying to generate the entry function looks at the mode of the symbol specified via --function. @hannes-steffenhagen-diffblue can you point at the code that you figured CBMC uses to decide the language?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tautschnig Sorry for taking so long to get back to you on this, I've added a PR that explains and fixes the issue here: #4104


std::string
goto_harness_generatort::require_exactly_one_value(const std::string &option,
const std::list<std::string> &values)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting here looks a bit odd.

}

void goto_harness_generatort::assert_no_values(const std::string &option,
const std::list<std::string> &values)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd formatting.


// TODO: What does this actually do? Do we need to configure the command
// line options for this to work?
config.set(cmdline);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just sets up the defaults (and would interpret options such as --32). I'm not sure that's really required here, but it's probably a good idea to have a config in a sane state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a problem with binaries generated by goto_cc. The problem is that while we are trying to call config.set_from_symbol_table(goto_model.symbol_table); from a binary that was compiled with goto-cc, we hit an UNREACHABLE assertion in configt::ansi_ct::set_arch_spec_x86_64(). It has flavourt::NONE.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, so you need to set the platform-specific defaults via this call.

goto_harness_parse_optionst::goto_harness_configt
goto_harness_parse_optionst::handle_common_options()
{
goto_harness_configt config{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid shadowing the global config object.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫
This PR failed Diffblue compatibility checks (cbmc commit: 96d51d8).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/99677408
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

The incompatibility may have been introduced by an earlier PR. In that case merging this
PR should be avoided unless it fixes the current incompatibility.

hannes-steffenhagen-diffblue added a commit to hannes-steffenhagen-diffblue/cbmc that referenced this pull request Feb 6, 2019
Previously, when we compiled for instance a C file with no `main()` with goto-cc
it would have no __CPROVER__start. CBMC would use the following logic to
determine the entry point:

1. If `--function` is not passed, use existing `__CPROVER__start`
   -- in our case, we don't have an existing `__CPROVER__start`, so this
      doesn't work for us
2. If `--function` _is_ passed, and a `__CPROVER__start` exists: Generate a new
   entry point calling the passed entry function, based on the mode of the
   existing `__CPROVER__start` (I am not sure why this mode rather than the mode
   of `--function`)
3. Otherwise, iterate over the non-binary input files, determine the language to
   use based on The file extension for each source file, and try to generate an
   appropriate `__CPROVER__start` based on that

The situation of having a single goto-binary with no `__CPROVER__start` as the
input file doesn't fit any of these cases, so regardless of whether `--function`
is passed CBMC is going to fail, complaining that it can't find or generate an
entry point. This adds a fourth case to the list:

4. If `--function` is passed and there is no `__CPROVER__start`, select the
   language based on the mode of the function parameter and use it to generate
   `__CPROVER__start`

This commit doesn't add tests for this change, but
diffblue#4014 adds tests that, in the absence of
this change, require hack-y workarounds (such as passing an empty file with a
`.c` extension to force C mode for case 3 in the list above) to work.
hannes-steffenhagen-diffblue added a commit to hannes-steffenhagen-diffblue/cbmc that referenced this pull request Feb 6, 2019
Previously, when we compiled for instance a C file with no `main()` with goto-cc
it would have no __CPROVER__start. CBMC would use the following logic to
determine the entry point:

1. If `--function` is not passed, use existing `__CPROVER__start`
   -- in our case, we don't have an existing `__CPROVER__start`, so this
      doesn't work for us
2. If `--function` _is_ passed, and a `__CPROVER__start` exists: Generate a new
   entry point calling the passed entry function, based on the mode of the
   existing `__CPROVER__start` (I am not sure why this mode rather than the mode
   of `--function`)
3. Otherwise, iterate over the non-binary input files, determine the language to
   use based on The file extension for each source file, and try to generate an
   appropriate `__CPROVER__start` based on that

The situation of having a single goto-binary with no `__CPROVER__start` as the
input file doesn't fit any of these cases, so regardless of whether `--function`
is passed CBMC is going to fail, complaining that it can't find or generate an
entry point. This adds a fourth case to the list:

4. If `--function` is passed and there is no `__CPROVER__start`, select the
   language based on the mode of the function parameter and use it to generate
   `__CPROVER__start`

This commit doesn't add tests for this change, but
diffblue#4014 adds tests that, in the absence of
this change, require hack-y workarounds (such as passing an empty file with a
`.c` extension to force C mode for case 3 in the list above) to work.
hannes-steffenhagen-diffblue added a commit to hannes-steffenhagen-diffblue/cbmc that referenced this pull request Feb 7, 2019
Previously, when we compiled for instance a C file with no `main()` with goto-cc
it would have no __CPROVER__start. CBMC would use the following logic to
determine the entry point:

1. If `--function` is not passed, use existing `__CPROVER__start`
   -- in our case, we don't have an existing `__CPROVER__start`, so this
      doesn't work for us
2. If `--function` _is_ passed, and a `__CPROVER__start` exists: Generate a new
   entry point calling the passed entry function, based on the mode of the
   existing `__CPROVER__start` (I am not sure why this mode rather than the mode
   of `--function`)
3. Otherwise, iterate over the non-binary input files, determine the language to
   use based on The file extension for each source file, and try to generate an
   appropriate `__CPROVER__start` based on that

The situation of having a single goto-binary with no `__CPROVER__start` as the
input file doesn't fit any of these cases, so regardless of whether `--function`
is passed CBMC is going to fail, complaining that it can't find or generate an
entry point. This adds a fourth case to the list:

4. If `--function` is passed and there is no `__CPROVER__start`, select the
   language based on the mode of the function parameter and use it to generate
   `__CPROVER__start`

This commit doesn't add tests for this change, but
diffblue#4014 adds tests that, in the absence of
this change, require hack-y workarounds (such as passing an empty file with a
`.c` extension to force C mode for case 3 in the list above) to work.
@hannes-steffenhagen-diffblue
Copy link
Contributor

@tautschnig
I'm going to wait until CI passes on #4104, then we'll rebase this on top of that and remove the workaround hack in the regression test, and then finally go ahead with merging this if there are no further requested changes.

hannes-steffenhagen-diffblue added a commit to hannes-steffenhagen-diffblue/cbmc that referenced this pull request Feb 7, 2019
Previously, when we compiled for instance a C file with no `main()` with goto-cc
it would have no __CPROVER__start. CBMC would use the following logic to
determine the entry point:

1. If `--function` is not passed, use existing `__CPROVER__start`
   -- in our case, we don't have an existing `__CPROVER__start`, so this
      doesn't work for us
2. If `--function` _is_ passed, and a `__CPROVER__start` exists: Generate a new
   entry point calling the passed entry function, based on the mode of the
   existing `__CPROVER__start` (I am not sure why this mode rather than the mode
   of `--function`)
3. Otherwise, iterate over the non-binary input files, determine the language to
   use based on The file extension for each source file, and try to generate an
   appropriate `__CPROVER__start` based on that

The situation of having a single goto-binary with no `__CPROVER__start` as the
input file doesn't fit any of these cases, so regardless of whether `--function`
is passed CBMC is going to fail, complaining that it can't find or generate an
entry point. This adds a fourth case to the list:

4. If `--function` is passed and there is no `__CPROVER__start`, select the
   language based on the mode of the function parameter and use it to generate
   `__CPROVER__start`

This commit doesn't add tests for this change, but
diffblue#4014 adds tests that, in the absence of
this change, require hack-y workarounds (such as passing an empty file with a
`.c` extension to force C mode for case 3 in the list above) to work.
This adds a framework for adding various new harness
generator types and adds tests. It also adds a harness
generator that calls a given goto function.

Co-authored-by: Hannes Steffenhagen <[email protected]>
Co-authored-by: Fotis Koutoulakis <[email protected]>
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 7b55287).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100129308

@NlightNFotis NlightNFotis merged commit c7b962f into diffblue:develop Feb 8, 2019
@NlightNFotis NlightNFotis deleted the new_chain_sh branch February 8, 2019 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants