Skip to content

Introduce a mechanism to refine expected behavior of regression tests by solver. #7292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jimgrundy opened this issue Nov 2, 2022 · 9 comments
Assignees
Labels
aws Bugs or features of importance to AWS CBMC users aws-high bug

Comments

@jimgrundy
Copy link
Collaborator

This PR introduced some new tests for unions in structs: #7278
Those tests pass with some solver configurations (built-in) but not others (Z3).
We have a mechanism to indicate which tests are expected to pass and which to fail, but we don't have a mechanism to indicate expected pass/fail status by solver.

The lack of such a mechanism is a hinderance for SMT back-end development because it discourages tests that don't pass in all solvers. Or, if we check in such tests then they will fail in some configurations and every person with a PR from here on out is doomed to investigate what this is and why it isn't related to their change. Or, worse, they stop paying attention to fails that don't block check in.

A mechanism needs to be added to handle this elegantly. If that can't be done soon then this change should be backed out until it is.

@feliperodri feliperodri added bug aws Bugs or features of importance to AWS CBMC users labels Nov 2, 2022
@tautschnig
Copy link
Collaborator

We have a mechanism to indicate which tests are expected to pass and which to fail, but we don't have a mechanism to indicate expected pass/fail status by solver.

I am not sure I can agree with this statement: don't the tags that we have:

  • "broken-smt-backend" -- broken for all solvers
  • "broken-cprover-smt-backend" -- broken just with the built-in solver
  • "broken-z3-smt-backend" -- broken with Z3, but not the built-in solver

do the trick? What else are we missing?

tautschnig added a commit to tautschnig/cbmc that referenced this issue Nov 3, 2022
The existing tags permit a more precise tracking to pinpoint where there
issues actually are.

Fixes: diffblue#7292
@TGWDB
Copy link
Contributor

TGWDB commented Nov 3, 2022

We have a mechanism to indicate which tests are expected to pass and which to fail, but we don't have a mechanism to indicate expected pass/fail status by solver.

I am not sure I can agree with this statement: don't the tags that we have:

  • "broken-smt-backend" -- broken for all solvers
  • "broken-cprover-smt-backend" -- broken just with the built-in solver
  • "broken-z3-smt-backend" -- broken with Z3, but not the built-in solver

do the trick? What else are we missing?

We also have CI with cvc, so that's one more solver for the potential list.

That said, this arose from issues with the KNOWNBUG run that does not (I'm pretty sure?) test all the solvers we have in CI.

Just how fine-grained do we need to be in CI here? We seem to (as per the quoted message) have pretty good fine-tuning, but it is not always understood or used.

@jimgrundy
Copy link
Collaborator Author

So, after discussion with @tautschnig I think what I want is the following:

  • These particular tests to be tagged using the set of tags we have to indicate accurately in which configurations they are expected to pass an which they are expected to fail
  • We conduct a review of there tests in the "core" set (run on PRs) and tag them all as "required" (can't merge your PR unless the outcome matches the expectations indicated by the tags). Perhaps with a couple of well motivated exceptions (csmith, coverage). This will force the next person who checks in a test to accurately tag it.
  • We write some documentation about what tags are available for tests and what they mean and place it somewhere prominent (it should get rendered into a developer section of our web site) so that a person who encounters this know what they need to do.

@jimgrundy
Copy link
Collaborator Author

Bumping this to "high" after discussion this morning to get it assigned to the right person/people. We may need @tautschnig to do the documentation, but perhaps DiffBlue hand handle the other aspects?

@tautschnig
Copy link
Collaborator

tautschnig commented Nov 15, 2022

* These particular tests to be tagged using the set of tags we have to indicate accurately in which configurations they are expected to pass an which they are expected to fail

This was completed in #7290.

* We conduct a review of there tests in the "core" set (run on PRs) and tag them all as "required" (can't merge your PR unless the outcome matches the expectations indicated by the tags). Perhaps with a couple of well motivated exceptions (csmith, coverage). This will force the next person who checks in a test to accurately tag it.

This requires admin permissions for the repository. @TGWDB could you please co-ordinate this activity?

* We write some documentation about what tags are available for tests and what they mean and place it somewhere prominent (it should get rendered into a developer section of our web site) so that a person who encounters this know what they need to do.

Turns out we already have this information in regression/readme.md. We need to figure out what file anyone might actually look at, and I'd lean towards just renaming this file to README.md so that it doesn't get lost as easily among all the directories. Update: Done in #7347.

@NlightNFotis
Copy link
Contributor

@tautschnig Is there something outstanding here, or are we good to close this ticket?

@tautschnig
Copy link
Collaborator

@tautschnig Is there something outstanding here, or are we good to close this ticket?

I think the piece left open is:

We conduct a review of there tests in the "core" set (run on PRs) and tag them all as "required" (can't merge your PR unless the outcome matches the expectations indicated by the tags). Perhaps with a couple of well motivated exceptions (csmith, coverage). This will force the next person who checks in a test to accurately tag it.

This requires someone with admin permissions to change which of the CI jobs are marked "Required."

@NlightNFotis
Copy link
Contributor

@tautschnig We have now updated the list of jobs required to succeed in merging a PR, with the following extra jobs now being required:

  • compileXen
  • compileLinux
  • check-ubuntu-20_04-cmake-gcc-KNOWNBUG
  • check-ubuntu-20_04-cmake-gcc-THOROUGH
  • check-ubuntu-20_04-make-clang-smt-z3
  • check-doxygen
  • windows-msi-package
  • ubuntu-18_04-package
  • check-docker-image

, in addition to the ones we marked as required before our last meeting.

Is there anything we're missing here, and if not, can we mark this as now done?

@tautschnig
Copy link
Collaborator

Thank you @NlightNFotis ! I think "Publish CBMC documentation / publish" could also be marked as such, but otherwise I think we're done here. Feel free to close!

@TGWDB TGWDB closed this as completed Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants