Skip to content

Add CSmith GitHub action #5700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 12, 2021
Merged

Add CSmith GitHub action #5700

merged 1 commit into from
Jan 12, 2021

Conversation

tautschnig
Copy link
Collaborator

Run 10 randomly generated CSmith tests as a GitHub action to perform an
integration test involving CBMC and goto-instrument --dump-c. Each test
is first compiled and run using GCC to obtain a checksum, which is then
included as an assertion for CBMC to verify.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig requested a review from a team as a code owner December 27, 2020 08:52
@codecov
Copy link

codecov bot commented Dec 27, 2020

Codecov Report

Merging #5700 (9b56cce) into develop (deba4ae) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #5700   +/-   ##
========================================
  Coverage    69.54%   69.54%           
========================================
  Files         1243     1243           
  Lines       100700   100700           
========================================
  Hits         70036    70036           
  Misses       30664    30664           
Flag Coverage Δ
cproversmt2 43.29% <ø> (ø)
regression 66.44% <ø> (ø)
unit 32.22% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update deba4ae...9b56cce. Read the comment docs.

@hannes-steffenhagen-diffblue
Copy link
Contributor

This seems like a good idea but the script could use some more explanation for what is going on there.

@tautschnig
Copy link
Collaborator Author

This seems like a good idea but the script could use some more explanation for what is going on there.

Indeed. I've added comments!

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I /really/ like the concept but ... non-deterministic things make me nervous. I would be happier if the scripts/csmith.sh took a random seed and csmith.yml set it to the hash of the diff or something similar that could be locally reproduced.

I'm also a little unsure about what you feel should happen if you find issues of this form. Is it the job of the person who submitted the PR to fix them? Raise issues?

@hannes-steffenhagen-diffblue
Copy link
Contributor

@martin-cs I'd say failure for these is not PR raiser responsibility, but if something does come up we should definitely take note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO "cbmc matches csmith test behaviour" and "goto-cc --> dump-c should yield program with same behaviour" are two different tests and probably shouldn't be in the same script, but otherwise I think this is a good addition for finding potential problems hopefully before users run into them.

Copy link
Contributor

@NlightNFotis NlightNFotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

Run 10 randomly generated CSmith tests as a GitHub action to perform an
integration test involving CBMC and goto-instrument --dump-c. Each test
is first compiled and run using GCC to obtain a checksum, which is then
included as an assertion for CBMC to verify.
@tautschnig
Copy link
Collaborator Author

I /really/ like the concept but ... non-deterministic things make me nervous. I would be happier if the scripts/csmith.sh took a random seed and csmith.yml set it to the hash of the diff or something similar that could be locally reproduced.

Very good point - I've now modified the script to:

  1. Generate a seed (using date +%s) and print that seed.
  2. Add the ability to re-run with a particular seed.

I'm also a little unsure about what you feel should happen if you find issues of this form. Is it the job of the person who submitted the PR to fix them? Raise issues?

I've done multiple thousands of runs of CSmith in recent weeks, and am thus fairly confident that the likelihood of a PR tripping this up by a bug outside the changes introduced in said PR are fairly low. Thus I'd argue it is the job of the person who submitted the PR to get this back in working order.

@tautschnig
Copy link
Collaborator Author

IMHO "cbmc matches csmith test behaviour" and "goto-cc --> dump-c should yield program with same behaviour" are two different tests and probably shouldn't be in the same script, but otherwise I think this is a good addition for finding potential problems hopefully before users run into them.

Yes, having two separate jobs might be nicer, but I'm a bit hesitant to have almost-the-same-script twice in the repository. So I'd for now stick with the current solution. If we find that we regularly only run into bugs in one or the other part then that'll provide an incentive to improve this.

@martin-cs
Copy link
Collaborator

@tautschnig : sounds great; let's do it!

@tautschnig tautschnig merged commit 5cac14f into diffblue:develop Jan 12, 2021
@tautschnig tautschnig deleted the csmith branch January 12, 2021 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants