Skip to content

Regression testing infrastructure for the SMT backend #5982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SaswatPadhi opened this issue Mar 28, 2021 · 11 comments
Closed

Regression testing infrastructure for the SMT backend #5982

SaswatPadhi opened this issue Mar 28, 2021 · 11 comments
Labels
aws Bugs or features of importance to AWS CBMC users RFC Request for comment SMT Backend Interface Tests

Comments

@SaswatPadhi
Copy link
Contributor

SaswatPadhi commented Mar 28, 2021

This is based on a recent discussion I had with @tautschnig. I wanted to open this RFC and get some feedback before introducing a new tag in #5973.

Currently we seem to have 3 tags for our regression tests that relevant to the SMT backend:

  • smt-backend: Tests that require the SMT backend, and don't work with the SAT backend
  • broken-smt-backend: Tests that pass with the SAT backend, but don't work with the SMT backend
  • thorough-smt-backend: Tests that pass with the SAT backend, but take a long time with the SMT backend

While working on #5973, I came across a regression test that works with both Z3 and CVC4, but doesn't work with CPROVER SMT2 because it lacks support for quantifiers. So I think it would make sense to add a few new tags that are solver-specific:

  • broken-cprover-smt2-smt-backend and thorough-cprover-smt2-smt-backend
  • broken-z3-smt-backend and thorough-z3-smt-backend
  • broken-cvc4-smt-backend and thorough-cvc4-smt-backend

The -smt-backend suffix for these tags would be useful for grep-ing all regression tests related to the SMT-backend in any way. The tags can also be combined, e.g.

  • smt-backend broken-cprover-smt2-smt-backend would indicate that the test requires SMT backend, but doesn't work CPROVER SMT2 solver
  • thorough-smt-backend broken-cvc4-smt-backend would indicate that the test runs slower with the SMT backend, and doesn't work at all with CVC4 backend

I also suggest running the same version of these solvers across all platforms on our CI. That way, each time we bump up the version of solver X on our CI, we can revisit all of the broken-X-smt-backend and thorough-X-smt-backend tests and check if any of them are now fixed.

How many solvers we want our CI to be running is a separate discussion, but I think we should definitely at least be running Z3, and preferably both Z3 and CVC4 (in future).

@peterschrammel
Copy link
Member

Sounds good to me.

Just too comments:

  • Isn't broken-smt-backend currenty used for "doesn't work with CPROVER SMT2" (i.e. has nothing to do with whether it works with other SMT solvers such as Z3 or CVC4)?
  • I'd use broken-cprover-smt-backend instead of broken-cprover-smt2-smt-backend.

@SaswatPadhi
Copy link
Contributor Author

Thanks Peter.

I wasn't sure about broken-smt-backend. I thought these tests triggered the SMT_TODO so don't work on any solvers yet.

If the tag is only about CPROVER SMT2 then I think once we have Z3 running on the CI, we should run all of these tests al retag them.

Should we use broken-smt-backend tag in future only for those tests that are broken on all solvers, because of unimplemented / buggy translation?

@martin-cs
Copy link
Collaborator

Assorted thoughts :

  • If we are targetting specific solvers I would strongly recommend that the set is at least cprover-smt2, bitwuzla, cvc4 and z3 as those are the ones with known users. OpenSMT and MathSAT would be good too but I don't know about their active user-base. Ideally we should avoid solver-specific things where possible.
  • SMT-LIB 3.0 is due out this summer which might help simplify things.
  • I think the ideal would be that every regression runs with every solver. We are not going to achieve that completely or easily but maybe that gives us an idea of what the tags should do. Maybe we should just have a tag to exclude each back-end (internal and all of the SMT solvers) and not bother with further subdivision. If we know why or want to distinguish between them, then put it in the comments of the test.

@SaswatPadhi
Copy link
Contributor Author

Hi @martin-cs,

I just wanted to clarify the suggestion that I was making -- I wasn't proposing implementing more solver-specific encodings, but I was suggesting that we fix the existing flags that we have: --cprover-smt2, --z3, --cvc4.

I also agree with that the emitted SMT encoding should be as solver-agnostic as possible. But that said, I think solver specific "optimizations" should be used for the supported solvers. For example, we can use quantifiers to initialize arrays in the "default" SMT encoding (solver-agnostic), but if Z3 can do it without quantifiers then we should take advantage of that.

But before adding support for the new solvers (Bitwuzla, OpenSMT, MathSAT), we should make sure that existing flags work as expected.

SMTLIB 3.0 would standardize some of the solver-specific extensions, so that would definitely make things better. However, solver adoption of the new standard would likely take a while...

@martin-cs
Copy link
Collaborator

I would suggest that the flags are the other way around. By default everything should work with all back-ends and then use flags to disable those that don't. So --no-SAT, --no-Z3, --no-CVC4.

Bitwuzla and MathSAT are both already supported and have been for years. Bitwuzla had commercial users last I heard. MathSAT and OpenSMT had research groups using them. I think including at least Bitwuzla in the testing is a good idea because it tests out the diversity of solvers and will avoid us getting into situations where our SMT-LIB back-end is a CVC4 & Z3 back-end.

@SaswatPadhi
Copy link
Contributor Author

I would suggest that the flags are the other way around. By default everything should work with all back-ends and then use flags to disable those that don't. So --no-SAT, --no-Z3, --no-CVC4.

I am unclear about this part. Running CBMC without any flags (i.e. default configuration) should be solver-agnostic as you said, then doesn't that imply --no-Z3 and --no-CVC4? I am not sure how the SMT encoding would differ on giving a --no-Z3 or --no-CVC4 flag.

@martin-cs
Copy link
Collaborator

Apologies; s/flags/tags/.

I think we should have a way of blocking regression test from being run with specific solvers (including the internal back-end). The default for regression tests that it is run with all solvers.

@SaswatPadhi
Copy link
Contributor Author

Thanks, got it.

I agree on this; I was suggesting the same with the broken-X-backend tags.

SaswatPadhi added a commit to padhi-forks/cbmc that referenced this issue Mar 30, 2021
As per the proposal described in diffblue#5982, this PR introduces new tags that
can be used to label tests that are known to be broken with specific solvers.
It also updates the Makefile targets to use these solver-specific tags.
SaswatPadhi added a commit to padhi-forks/cbmc that referenced this issue Mar 30, 2021
As per the proposal described in diffblue#5982, this PR introduces new tags that
can be used to label tests that are known to be broken with specific solvers.
It also updates the Makefile targets to use these solver-specific tags.
SaswatPadhi added a commit to padhi-forks/cbmc that referenced this issue Mar 31, 2021
As per the proposal described in diffblue#5982, this PR introduces new tags that
can be used to label tests that are known to be broken with specific solvers.
It also updates the Makefile targets to use these solver-specific tags.
SaswatPadhi added a commit to padhi-forks/cbmc that referenced this issue Mar 31, 2021
As per the proposal described in diffblue#5982, this PR introduces new tags that
can be used to label tests that are known to be broken with specific solvers.
It also updates the Makefile targets to use these solver-specific tags.
jezhiggins pushed a commit to jezhiggins/cbmc that referenced this issue Apr 23, 2021
As per the proposal described in diffblue#5982, this PR introduces new tags that
can be used to label tests that are known to be broken with specific solvers.
It also updates the Makefile targets to use these solver-specific tags.
@TGWDB TGWDB added the aws Bugs or features of importance to AWS CBMC users label Apr 29, 2021
@thomasspriggs
Copy link
Contributor

Apologies; s/flags/tags/.

I think we should have a way of blocking regression test from being run with specific solvers (including the internal back-end). The default for regression tests that it is run with all solvers.

It is worth noting that the only external SMT2 solver we currently install in CI is Z3 and as far as I know it is only being used on a small handful of tests at the moment. Expanding the selection of SMT solvers will require CI updates in order to make them available on CI runs. I am not against adding more solvers, just noting that it will require some work to make them available and maintain these dependencies. Running all tests across all possible solvers sounds like a great ideal, but I am a little concerned that it could cause a problematic increase in the amount of CI run-time we need.

@SaswatPadhi
Copy link
Contributor Author

BTW, I opened this draft PR: #6000 to run all of the regressions tests that pass with CPROVER SMT2 also with Z3. Apparently there are some tests that pass with CPROVER SMT2 backend, but not Z3.

@SaswatPadhi
Copy link
Contributor Author

I guess we can close this issue now, since:

  1. I have committed some new SMT_solver-specific tags in Added new solver-specific tags for SMT regression tests #5985.
  2. The SMS regression tests discussion would be part of the larger SMT backend rewrite effort being tracked @ [RFC] SMT Plan #6134.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users RFC Request for comment SMT Backend Interface Tests
Projects
None yet
Development

No branches or pull requests

5 participants