-
Notifications
You must be signed in to change notification settings - Fork 274
Regression testing infrastructure for the SMT backend #5982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sounds good to me. Just too comments:
|
Thanks Peter. I wasn't sure about If the tag is only about CPROVER SMT2 then I think once we have Z3 running on the CI, we should run all of these tests al retag them. Should we use |
Assorted thoughts :
|
Hi @martin-cs, I just wanted to clarify the suggestion that I was making -- I wasn't proposing implementing more solver-specific encodings, but I was suggesting that we fix the existing flags that we have: I also agree with that the emitted SMT encoding should be as solver-agnostic as possible. But that said, I think solver specific "optimizations" should be used for the supported solvers. For example, we can use quantifiers to initialize arrays in the "default" SMT encoding (solver-agnostic), but if Z3 can do it without quantifiers then we should take advantage of that. But before adding support for the new solvers (Bitwuzla, OpenSMT, MathSAT), we should make sure that existing flags work as expected. SMTLIB 3.0 would standardize some of the solver-specific extensions, so that would definitely make things better. However, solver adoption of the new standard would likely take a while... |
I would suggest that the flags are the other way around. By default everything should work with all back-ends and then use flags to disable those that don't. So Bitwuzla and MathSAT are both already supported and have been for years. Bitwuzla had commercial users last I heard. MathSAT and OpenSMT had research groups using them. I think including at least Bitwuzla in the testing is a good idea because it tests out the diversity of solvers and will avoid us getting into situations where our SMT-LIB back-end is a CVC4 & Z3 back-end. |
I am unclear about this part. Running CBMC without any flags (i.e. default configuration) should be solver-agnostic as you said, then doesn't that imply |
Apologies; s/flags/tags/. I think we should have a way of blocking regression test from being run with specific solvers (including the internal back-end). The default for regression tests that it is run with all solvers. |
Thanks, got it. I agree on this; I was suggesting the same with the |
As per the proposal described in diffblue#5982, this PR introduces new tags that can be used to label tests that are known to be broken with specific solvers. It also updates the Makefile targets to use these solver-specific tags.
As per the proposal described in diffblue#5982, this PR introduces new tags that can be used to label tests that are known to be broken with specific solvers. It also updates the Makefile targets to use these solver-specific tags.
As per the proposal described in diffblue#5982, this PR introduces new tags that can be used to label tests that are known to be broken with specific solvers. It also updates the Makefile targets to use these solver-specific tags.
As per the proposal described in diffblue#5982, this PR introduces new tags that can be used to label tests that are known to be broken with specific solvers. It also updates the Makefile targets to use these solver-specific tags.
As per the proposal described in diffblue#5982, this PR introduces new tags that can be used to label tests that are known to be broken with specific solvers. It also updates the Makefile targets to use these solver-specific tags.
It is worth noting that the only external SMT2 solver we currently install in CI is Z3 and as far as I know it is only being used on a small handful of tests at the moment. Expanding the selection of SMT solvers will require CI updates in order to make them available on CI runs. I am not against adding more solvers, just noting that it will require some work to make them available and maintain these dependencies. Running all tests across all possible solvers sounds like a great ideal, but I am a little concerned that it could cause a problematic increase in the amount of CI run-time we need. |
BTW, I opened this draft PR: #6000 to run all of the regressions tests that pass with CPROVER SMT2 also with Z3. Apparently there are some tests that pass with CPROVER SMT2 backend, but not Z3. |
I guess we can close this issue now, since:
|
Uh oh!
There was an error while loading. Please reload this page.
This is based on a recent discussion I had with @tautschnig. I wanted to open this RFC and get some feedback before introducing a new tag in #5973.
Currently we seem to have 3 tags for our regression tests that relevant to the SMT backend:
smt-backend
: Tests that require the SMT backend, and don't work with the SAT backendbroken-smt-backend
: Tests that pass with the SAT backend, but don't work with the SMT backendthorough-smt-backend
: Tests that pass with the SAT backend, but take a long time with the SMT backendWhile working on #5973, I came across a regression test that works with both Z3 and CVC4, but doesn't work with CPROVER SMT2 because it lacks support for quantifiers. So I think it would make sense to add a few new tags that are solver-specific:
broken-cprover-smt2-smt-backend
andthorough-cprover-smt2-smt-backend
broken-z3-smt-backend
andthorough-z3-smt-backend
broken-cvc4-smt-backend
andthorough-cvc4-smt-backend
The
-smt-backend
suffix for these tags would be useful forgrep
-ing all regression tests related to the SMT-backend in any way. The tags can also be combined, e.g.smt-backend broken-cprover-smt2-smt-backend
would indicate that the test requires SMT backend, but doesn't work CPROVER SMT2 solverthorough-smt-backend broken-cvc4-smt-backend
would indicate that the test runs slower with the SMT backend, and doesn't work at all with CVC4 backendI also suggest running the same version of these solvers across all platforms on our CI. That way, each time we bump up the version of solver
X
on our CI, we can revisit all of thebroken-X-smt-backend
andthorough-X-smt-backend
tests and check if any of them are now fixed.How many solvers we want our CI to be running is a separate discussion, but I think we should definitely at least be running Z3, and preferably both Z3 and CVC4 (in future).
The text was updated successfully, but these errors were encountered: