Regression testing infrastructure for the SMT backend #5982

SaswatPadhi · 2021-03-28T01:18:32Z

This is based on a recent discussion I had with @tautschnig. I wanted to open this RFC and get some feedback before introducing a new tag in #5973.

Currently we seem to have 3 tags for our regression tests that relevant to the SMT backend:

smt-backend: Tests that require the SMT backend, and don't work with the SAT backend
broken-smt-backend: Tests that pass with the SAT backend, but don't work with the SMT backend
thorough-smt-backend: Tests that pass with the SAT backend, but take a long time with the SMT backend

While working on #5973, I came across a regression test that works with both Z3 and CVC4, but doesn't work with CPROVER SMT2 because it lacks support for quantifiers. So I think it would make sense to add a few new tags that are solver-specific:

broken-cprover-smt2-smt-backend and thorough-cprover-smt2-smt-backend
broken-z3-smt-backend and thorough-z3-smt-backend
broken-cvc4-smt-backend and thorough-cvc4-smt-backend

The -smt-backend suffix for these tags would be useful for grep-ing all regression tests related to the SMT-backend in any way. The tags can also be combined, e.g.

smt-backend broken-cprover-smt2-smt-backend would indicate that the test requires SMT backend, but doesn't work CPROVER SMT2 solver
thorough-smt-backend broken-cvc4-smt-backend would indicate that the test runs slower with the SMT backend, and doesn't work at all with CVC4 backend

I also suggest running the same version of these solvers across all platforms on our CI. That way, each time we bump up the version of solver X on our CI, we can revisit all of the broken-X-smt-backend and thorough-X-smt-backend tests and check if any of them are now fixed.

How many solvers we want our CI to be running is a separate discussion, but I think we should definitely at least be running Z3, and preferably both Z3 and CVC4 (in future).

The text was updated successfully, but these errors were encountered:

peterschrammel · 2021-03-28T14:07:57Z

Sounds good to me.

Just too comments:

Isn't broken-smt-backend currenty used for "doesn't work with CPROVER SMT2" (i.e. has nothing to do with whether it works with other SMT solvers such as Z3 or CVC4)?
I'd use broken-cprover-smt-backend instead of broken-cprover-smt2-smt-backend.

SaswatPadhi · 2021-03-28T16:22:25Z

Thanks Peter.

I wasn't sure about broken-smt-backend. I thought these tests triggered the SMT_TODO so don't work on any solvers yet.

If the tag is only about CPROVER SMT2 then I think once we have Z3 running on the CI, we should run all of these tests al retag them.

Should we use broken-smt-backend tag in future only for those tests that are broken on all solvers, because of unimplemented / buggy translation?

martin-cs · 2021-03-28T17:03:48Z

Assorted thoughts :

If we are targetting specific solvers I would strongly recommend that the set is at least cprover-smt2, bitwuzla, cvc4 and z3 as those are the ones with known users. OpenSMT and MathSAT would be good too but I don't know about their active user-base. Ideally we should avoid solver-specific things where possible.
SMT-LIB 3.0 is due out this summer which might help simplify things.
I think the ideal would be that every regression runs with every solver. We are not going to achieve that completely or easily but maybe that gives us an idea of what the tags should do. Maybe we should just have a tag to exclude each back-end (internal and all of the SMT solvers) and not bother with further subdivision. If we know why or want to distinguish between them, then put it in the comments of the test.

SaswatPadhi · 2021-03-29T06:56:53Z

Hi @martin-cs,

I just wanted to clarify the suggestion that I was making -- I wasn't proposing implementing more solver-specific encodings, but I was suggesting that we fix the existing flags that we have: --cprover-smt2, --z3, --cvc4.

I also agree with that the emitted SMT encoding should be as solver-agnostic as possible. But that said, I think solver specific "optimizations" should be used for the supported solvers. For example, we can use quantifiers to initialize arrays in the "default" SMT encoding (solver-agnostic), but if Z3 can do it without quantifiers then we should take advantage of that.

But before adding support for the new solvers (Bitwuzla, OpenSMT, MathSAT), we should make sure that existing flags work as expected.

SMTLIB 3.0 would standardize some of the solver-specific extensions, so that would definitely make things better. However, solver adoption of the new standard would likely take a while...

martin-cs · 2021-03-29T10:47:36Z

I would suggest that the flags are the other way around. By default everything should work with all back-ends and then use flags to disable those that don't. So --no-SAT, --no-Z3, --no-CVC4.

Bitwuzla and MathSAT are both already supported and have been for years. Bitwuzla had commercial users last I heard. MathSAT and OpenSMT had research groups using them. I think including at least Bitwuzla in the testing is a good idea because it tests out the diversity of solvers and will avoid us getting into situations where our SMT-LIB back-end is a CVC4 & Z3 back-end.

SaswatPadhi · 2021-03-29T14:02:02Z

I would suggest that the flags are the other way around. By default everything should work with all back-ends and then use flags to disable those that don't. So --no-SAT, --no-Z3, --no-CVC4.

I am unclear about this part. Running CBMC without any flags (i.e. default configuration) should be solver-agnostic as you said, then doesn't that imply --no-Z3 and --no-CVC4? I am not sure how the SMT encoding would differ on giving a --no-Z3 or --no-CVC4 flag.

martin-cs · 2021-03-29T15:20:04Z

Apologies; s/flags/tags/.

I think we should have a way of blocking regression test from being run with specific solvers (including the internal back-end). The default for regression tests that it is run with all solvers.

SaswatPadhi · 2021-03-29T15:27:11Z

Thanks, got it.

I agree on this; I was suggesting the same with the broken-X-backend tags.

As per the proposal described in diffblue#5982, this PR introduces new tags that can be used to label tests that are known to be broken with specific solvers. It also updates the Makefile targets to use these solver-specific tags.

thomasspriggs · 2021-04-30T09:29:24Z

Apologies; s/flags/tags/.

I think we should have a way of blocking regression test from being run with specific solvers (including the internal back-end). The default for regression tests that it is run with all solvers.

It is worth noting that the only external SMT2 solver we currently install in CI is Z3 and as far as I know it is only being used on a small handful of tests at the moment. Expanding the selection of SMT solvers will require CI updates in order to make them available on CI runs. I am not against adding more solvers, just noting that it will require some work to make them available and maintain these dependencies. Running all tests across all possible solvers sounds like a great ideal, but I am a little concerned that it could cause a problematic increase in the amount of CI run-time we need.

SaswatPadhi · 2021-04-30T19:52:55Z

BTW, I opened this draft PR: #6000 to run all of the regressions tests that pass with CPROVER SMT2 also with Z3. Apparently there are some tests that pass with CPROVER SMT2 backend, but not Z3.

SaswatPadhi · 2021-08-20T20:59:53Z

I guess we can close this issue now, since:

I have committed some new SMT_solver-specific tags in Added new solver-specific tags for SMT regression tests #5985.
The SMS regression tests discussion would be part of the larger SMT backend rewrite effort being tracked @ [RFC] SMT Plan #6134.

SaswatPadhi added SMT Backend Interface Tests RFC Request for comment labels Mar 28, 2021

SaswatPadhi mentioned this issue Mar 30, 2021

Added new solver-specific tags for SMT regression tests #5985

Merged

2 tasks

TGWDB added the aws Bugs or features of importance to AWS CBMC users label Apr 29, 2021

SaswatPadhi closed this as completed Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression testing infrastructure for the SMT backend #5982

Regression testing infrastructure for the SMT backend #5982

SaswatPadhi commented Mar 28, 2021 •

edited

Loading

peterschrammel commented Mar 28, 2021

Uh oh!

SaswatPadhi commented Mar 28, 2021

Uh oh!

martin-cs commented Mar 28, 2021

Uh oh!

SaswatPadhi commented Mar 29, 2021

Uh oh!

martin-cs commented Mar 29, 2021

Uh oh!

SaswatPadhi commented Mar 29, 2021

Uh oh!

martin-cs commented Mar 29, 2021

Uh oh!

SaswatPadhi commented Mar 29, 2021

Uh oh!

thomasspriggs commented Apr 30, 2021

Uh oh!

SaswatPadhi commented Apr 30, 2021

Uh oh!

SaswatPadhi commented Aug 20, 2021

Uh oh!

Regression testing infrastructure for the SMT backend #5982

Regression testing infrastructure for the SMT backend #5982

Comments

SaswatPadhi commented Mar 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

peterschrammel commented Mar 28, 2021

Uh oh!

SaswatPadhi commented Mar 28, 2021

Uh oh!

martin-cs commented Mar 28, 2021

Uh oh!

SaswatPadhi commented Mar 29, 2021

Uh oh!

martin-cs commented Mar 29, 2021

Uh oh!

SaswatPadhi commented Mar 29, 2021

Uh oh!

martin-cs commented Mar 29, 2021

Uh oh!

SaswatPadhi commented Mar 29, 2021

Uh oh!

thomasspriggs commented Apr 30, 2021

Uh oh!

SaswatPadhi commented Apr 30, 2021

Uh oh!

SaswatPadhi commented Aug 20, 2021

Uh oh!

SaswatPadhi commented Mar 28, 2021 •

edited

Loading