Initial data structures for incremental smt2 solving #6165

thomasspriggs · 2021-06-10T14:16:07Z

Initial data structures for implementation of incremental smt2 solving.

Standard PR checklist:

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2021-06-10T15:44:11Z

Codecov Report

Merging #6165 (8bc61d6) into develop (db1a8b0) will increase coverage by 0.02%.
The diff coverage is 84.10%.

@@             Coverage Diff             @@
##           develop    #6165      +/-   ##
===========================================
+ Coverage    75.47%   75.49%   +0.02%     
===========================================
  Files         1459     1474      +15     
  Lines       161447   161839     +392     
===========================================
+ Hits        121852   122182     +330     
- Misses       39595    39657      +62

Impacted Files	Coverage Δ
src/solvers/smt2_incremental/smt_commands.def	`0.00% <0.00%> (ø)`
src/solvers/smt2_incremental/smt_logics.h	`0.00% <0.00%> (ø)`
src/solvers/smt2_incremental/smt_options.def	`0.00% <0.00%> (ø)`
src/solvers/smt2_incremental/smt_options.cpp	`43.75% <43.75%> (ø)`
src/solvers/smt2_incremental/smt_commands.cpp	`80.51% <80.51%> (ø)`
src/solvers/smt2_incremental/smt_sorts.cpp	`81.25% <81.25%> (ø)`
src/solvers/smt2_incremental/smt_terms.cpp	`85.00% <85.00%> (ø)`
unit/solvers/smt2_incremental/smt_sorts.cpp	`88.88% <88.88%> (ø)`
unit/solvers/smt2_incremental/smt_terms.cpp	`91.78% <91.78%> (ø)`
src/solvers/smt2_incremental/smt_commands.h	`100.00% <100.00%> (ø)`
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06c563a...8bc61d6. Read the comment docs.

TGWDB

Also no unit tests for the smt_commands?

src/solvers/smt2_incremental/smt_terms.cpp

unit/solvers/smt2_incremental/smt_terms.cpp

thomasspriggs · 2021-06-10T16:25:25Z

Also no unit tests for the smt_commands?

Its on the TODO list in the PR description.

TGWDB · 2021-06-11T08:18:20Z

For bookkeeping, note that this is part of the SMT backend work with broad explanation and RFC in #6134

martin-cs

I feel like I must be missing / not understanding a requirement here. The commands; yes, they are good and would be useful. But the rest -- why not just use exprt and typet? The underlying representation is the same and there are already a load of utility functions. Also, in the vast majority of cases, the semantics are the same as well. I can understand that it might be nice to use the C++ type system to mark which are values used in this subsystem or having had some conversions / checks applied but that can be done in a much more lightweight fashion.

martin-cs · 2021-06-11T10:42:10Z

src/solvers/smt2_incremental/smt_sorts.h

+
+class smt_sort_const_downcast_visitort;
+
+class smt_sortt : protected irept


Why define these separately when you could use typet? The type system is essentially identical. I can see a sort of value in having a class smt_sortt : public typet and a smt_sortt to_smt_sort(typet &t) function so you can use the C++ type safety to enforce conversion but this really seems like overkill.

martin-cs · 2021-06-11T10:44:45Z

src/solvers/smt2_incremental/smt_commands.h

+#include <solvers/smt2_incremental/smt_terms.h>
+#include <util/irep.h>
+
+class smt_commandt : protected irept


As a sanity test for this, would it be worth integrating these into smt2_solver / smt2_parser? They should make the separation between parsing and solving much clearer.

Yes. Using the new structures for the solving will help sanity test them. I have already found several minor issues by working on solving/converting to string locally. However I feel it is better to get a first version merged and then refine the structures as we work on the solving, rather than perfect the first version before merging the first PR. Integrating the new structures into the parsing would be a nice-to-have. But we are currently thinking of working on exprt -> new structures and new structures -> string before parsing.

martin-cs · 2021-06-11T10:47:02Z

src/solvers/smt2_incremental/smt_sorts.h

+
+  /// \brief Class for adding the ability to up and down cast smt_sortt to and
+  ///   from irept. These casts are required by other irept derived classes in
+  ///   order to store instances of smt_sortt inside them.


Why not just publicly inherit from irept?

Because I want to restrict the ability to cast to and from irept, to the inside of the storage classes. I see the fact that each of these inherit from irept to be an implementation detail. Ideally, there should be no need to upcast to irept as part of the algorithms which live outside the storage classes. Keeping variables declared as smt_termt or smt_sortt rather than irept where possible will give us more specific compile time type information.

Wouldn't your rationale then suggest using an irept member instead of inheriting?

Wouldn't your rationale then suggest using an irept member instead of inheriting?

Yes, I could have done that. I could also have implemented this without using irept at all and handled lifetime using std::unique_ptr or some such. By using irept with inheritance this code retains a greater semblance to other parts of the code base. This should aid cbmc developers familiar with other parts of the code base to transition to maintaining or improving this part and it should aid with reusing the existing pretty, operator== and hashing algorithms defined for irept. Otherwise we would need to write more new code to define these pieces of functionality for the new data structures.

martin-cs · 2021-06-11T10:48:15Z

src/solvers/smt2_incremental/smt_sorts.def

@@ -0,0 +1,2 @@
+SORT_ID(bool)


I honestly don't see the value of using this over ID_bool used in bool_typet.

Please bear in mind that this file is not only being used to define the IDs of the smt_sortt classes. It is also being used to automate writing some of the more repetitive parts of the visitor pattern. 🤔 I should probably add some doxygen to this file explaining how it is used. What would be the value in rationing the number of different IDs used rather than keeping them separate?

martin-cs · 2021-06-11T10:49:52Z

src/solvers/smt2_incremental/smt_terms.h

+class smt_sortt;
+class smt_term_const_downcast_visitort;
+
+class smt_termt : protected irept, private smt_sortt::storert<smt_termt>


Why not just user exprt or if you want to do some kind of conversion, then have a simple wrapper around exprt that marks the converted values?

Because this gives us more compile-time checking of the soundness of the conversion and better guarantees about the separation of concerns. This applies both in terms of the conversion from exprt to smt_termt and the conversion from smt_termt to string or other representation understood by the solver. We are only currently planning a conversion to SMT2 strings, but by keeping the conversion to strings separate it would theoretically allow for more easily adding an additional conversion algorithm such as to smt-switch data structures to be added on top as an additional run time option. Without the new data structures, the conversion from exprt to smt_termt could "leak" unconverted exprt classes which can only be caught at run time. The way the visitor is defined will also mean that missing conversions from smt_termt to string would also cause a compile error. There has to be at least a placeholder method in place with the UNIMPLEMENTED macro.

Defining the data structure will give us a clean separation between the conversion from exprt to smt_termt and smt_termt to string. It will allow for separately unit testing the two conversions. A new data structure only for use with smt2 has the same advantages that having separate data structures from the language front ends such as for Java and C vs the internal goto_program representation would have.

I think you can achieve much of the compile time checking at a much lower cost of having something like...

class smt_sortt : public typet {}; class smt_termt : public exprtt {}; class smt_translator { public : virtual smt_sortt type_to_sort(const typet &t) = 0; virtual smt_termt expr_to_term(const exprt &e) = 0; }

If this is the only way of producing these two types and the rest of the code is const-clean and the critical functions take smt_sortt and smt_termt then it will be very rare to have failed-to-convert bug.

I would suggest an object because these transformations tend to wind up being stateful no matter how good everyone's intent is. They will likely also need to be stackable in some sense. As soon as you have a couple of options on how to translate each theory / type (IIRC this was on your list of goals?) then you will have a combinatorial explosion of normal forms, so it is not so easy to use types* to enforce normal forms.

Using the visitor to do the actual conversion or caching converted things is smart idea and it is good to see it early integrated into the design here.

*. I could be wrong, if I am I would love to know because the handling of goto-programs has exactly this issue.

I am not in favour of publicly inheriting from typet, for the following reasons -

It would support down casting from typet to smt_sortt and require checks to be written for these casts.

It would add public methods to smt_sortt such as subtype() and source_location() which don't necessarily make sense in the context of a sort.

It would add public access to member functions from irept, which would allow direct manipulation of the irept data structures through weakly typed member functions such as move_to_sub.

It would add public access to irept's is_nil member function, whereas I would prefer to make the classes in this hierarchy non-nullable/nillable. That way optionalt can be added on top of these classes only in the places where nullability is a desirable property.

On the point of the translator - Yes, we will need functionality to convert from exprt to termt. The choice of whether to use member functions or free functions with configuration/state struct parameters to do the conversions is mostly a stylistic choice in my opinion, rather than a crucial difference. This is unless we really need virtual inheritance and multiple different translators. I have found free functions to be a little easier to unit test in some circumstances, which is a point in their favour. I do think it would improve readability to store configuration and state separately assuming we end up needing both though. That way the configuration can be kept const and the state mutable.

src/solvers/smt2_incremental/smt_terms.h

src/solvers/smt2_incremental/smt_commands.def

NlightNFotis

Some questions from me.

NlightNFotis · 2021-06-16T14:54:48Z

unit/solvers/smt2_incremental/smt_sorts.cpp

+{
+  const smt_bit_vector_sortt bit_vec_sort{32};
+  smt_sortt sort{bit_vec_sort};
+  sort = bit_vec_sort;


What are these two test cases testing for exactly?

It tests that the copy constructor of the smt_sortt class can be called with a smt_bit_vector_sortt typed parameter and the same for the assignment operator. Effectively testing the upcast which is carried out in these two operations is accessible. I added these two tests because the same test with the irept constructor should fail due to the access restrictions. Writing a test that it does not work with irept, could be done using some template meta programming to write a "does not compile" function, but probably would not be straight forward code to for us to read or write.

NlightNFotis · 2021-06-16T15:05:48Z

unit/solvers/smt2_incremental/smt_sorts.cpp

+
+TEST_CASE("Visiting smt_bool_sortt.", "[core][smt2_incremental]")
+{
+  class : public smt_sort_const_downcast_visitort


⛏️ Given that the visitor is nearly the same in both of the following test cases, does it make sense for it to be pulled out somewhere?

Yes. I usually err on the side of more duplication in unit tests, in order to make it more obvious why the properties the test is testing should hold. The duplication isn't too bad for the sorts at the moment as there are only a couple of cases. But it will be much worse for the terms and commands and it will be worse if we expand this with more sorts. Therefore I'll come up with a way to factor it out for the terms and commands and then back port the solution to the sorts.

NlightNFotis · 2021-06-16T15:08:48Z

unit/solvers/smt2_incremental/smt_to_string.cpp

+
+TEST_CASE("Test smt_sortt to string conversion", "[core][smt2_incremental]")
+{
+  CHECK(smt_to_string(smt_bool_sortt{}) == "Bool");


👍🏻 I really like how clean these are because of the visitor - I feel like it was a good design choice.

NlightNFotis · 2021-06-16T15:12:30Z

src/solvers/smt2_incremental/smt_sorts.cpp

@@ -10,6 +10,16 @@
 #include <solvers/smt2_incremental/smt_sorts.def>
 #undef SORT_ID

+bool smt_sortt::operator==(const smt_sortt &other) const
+{
+  return irept::operator==(other);


Question: since this just calls the base class' method, with no other change to it, does it make sense to kill this method here and just using irept::operator==; in the header file?

The irept::operator== member function takes an irept as its parameter / right hand side. So yes we could add using irept::operator==;, but the existing tests which use the smt_sortt::operator== would not compile because the implicit upcast to irept is banned from outside of the data structure's member functions due to the use of protected inheritance. Also I don't think it would make sense to allow comparisons such as smt_bool_sortt{} == irept{}. 1) because sorts should not be stored as irept outside of the irep based data structures and 2) because if we know that one side is not a sort at compile time and that the other side is a sort, then we know that the result would always be false.

thomasspriggs · 2021-06-21T08:48:58Z

Note: Some of the make build jobs are currently failing CI because the makefile miscounts the tests which are using TEMPLATE_TEST_CASE

TGWDB · 2021-07-14T10:40:59Z

src/solvers/smt2_incremental/internal_ast.txt

+@startuml
+' This diagram is written using PlantUML. It can be visualised locally or using
+' the PlantUML website - http://www.plantuml.com/plantuml/uml/.
+' On Linux the visualisation tool can be installed using -
+'   `sudo apt-get install plantuml`
+' Running the `plantuml` binary from the directory where this file exists will
+' give the option to view the file.
+


This file is great to have (and also to be able to generate). I just have the general concern that this is not generated or linked from the source code? How easy will it be for the two to lose synchronisation?

I think there is a doxygen plugin which can generate a graph. However I remember these being less than ideal. I suggest we either keep the plantUML and live with the reality that it will likely get out of date, or I remove it entirely and put any useful information from the graph nodes into the doxygen above the corresponding classes.

TGWDB · 2021-07-14T12:46:22Z

src/solvers/smt2_incremental/smt_logics.cpp

+smt_logic_quantifier_free_bit_vectorst::smt_logic_quantifier_free_bit_vectorst()
+  : smt_logict{ID_smt_logic_quantifier_free_bit_vectors}


There is some tension here since the code here (and later also marked) uses quantifier_free_bit_vectors but I think the actual logic is likely to be QF_AUFBV (based on what is used now). Note that this is not QF_BV which corresponds to quantifier free bit vector theory. (QF_AUFBV also allows arrays and free sort and function symbols, more details on logics here).

Yes. quantifier_free_bit_vectors would be just QF_BV. QF_AUFBV is what we were using in the old back end up until recently. I should fix the full names to match the abbreviations which get written out to the solver.

TGWDB · 2021-07-14T12:46:47Z

src/solvers/smt2_incremental/smt_logics.def

+///  * The constant `irep_idt`s used to identify each of the logic classes.
+///  * The member functions of the `smt_logic_const_downcast_visitort` class.
+///  * The type of option checks required to implement `smt_logict::accept`.
+LOGIC_ID(quantifier_free_bit_vectors)


logic tension

TGWDB · 2021-07-14T12:47:01Z

src/solvers/smt2_incremental/smt_logics.h

+class smt_logic_quantifier_free_bit_vectorst : public smt_logict
+{
+public:
+  smt_logic_quantifier_free_bit_vectorst();
+};


logic tension

src/solvers/smt2_incremental/smt_terms.cpp

TGWDB · 2021-07-14T13:05:20Z

unit/solvers/smt2_incremental/smt_terms.cpp

+  term = smt_bool_literal_termt{false};
+}
+
+TEST_CASE("Test smt_termt.pretty is accessible.", "[core][smt2_incremental]")


Technically this is testing both that the pretty function is accessible and the output. I guess a change to the base pretty function could invalidate this test without any changes to the smt_termt code.

TGWDB · 2021-07-14T13:07:17Z

unit/solvers/smt2_incremental/smt_terms.cpp

+  CHECK_THROWS(smt_identifier_termt{"foo\\ bar", smt_bool_sortt{}});
+}
+
+TEST_CASE("smt_identifier_termt upcasts.", "[core][smt2_incremental]")


No actual CHECK inside this test case, I guess it's a style choice?

Yeah, I wrote these upcast tests when I was initially working on the accessibility, at that time I just wanted to be sure that the upcasts compiled. However now that the rest of the tests are in place and full of upcasts, I am not sure that this has value. Therefore I might just remove all the upcast tests.

I have now removed these tests, as I was unconvinced of the value they added.

So that instances of `smt_sortt` can be stored in other `irept` derived classes without enabling upcasting of `smt_sortt` to `irept` from outside of the heirarchy of `irept` based classes.

So that it can be debugged and further features added separately from the `makefile`. Note that the check for `[.]` was already broken in the makefile because `[.]` is inserted using the XFAIL macro, but the files are grepped before macro preprocessing.

So that we can use catch's `TEMPLATE_TEST_CASE` without the test count failing when using make to run the tests. When using `TEMPLATE_TEST_CASE` the number of tests added to catch's count of test depends on the number of type arguments supplied. Therefore this update is needed to count the arguments to get the correct count.

Because this will be used to implement term to string/output stream conversion. Use of a visitor will allow this printing code to be separated from the data structure.

So that instaces of smt_termt can be stored inside other non-term ireps.

This class diagram served as the notes for implementing the initial set of classes. It is committed to the repository for future reference.

chrisr-diffblue

Looks sensible to me.

thomasspriggs added SMT Backend Interface work in progress labels Jun 10, 2021

TGWDB reviewed Jun 10, 2021

View reviewed changes

thomasspriggs force-pushed the tas/smt_data_structures branch 3 times, most recently from 9409d99 to e0c2984 Compare June 10, 2021 18:42

thomasspriggs force-pushed the tas/smt_data_structures branch from e0c2984 to f9a2626 Compare June 11, 2021 09:37

martin-cs reviewed Jun 11, 2021

View reviewed changes

thomasspriggs force-pushed the tas/smt_data_structures branch from f9a2626 to ec38e7f Compare June 11, 2021 12:31

thomasspriggs self-assigned this Jun 11, 2021

SaswatPadhi reviewed Jun 11, 2021

View reviewed changes

src/solvers/smt2_incremental/smt_commands.def Show resolved Hide resolved

thomasspriggs force-pushed the tas/smt_data_structures branch 2 times, most recently from 213fecb to f73547c Compare June 16, 2021 09:17

NlightNFotis reviewed Jun 16, 2021

View reviewed changes

thomasspriggs force-pushed the tas/smt_data_structures branch from a5b5674 to f313f6e Compare June 16, 2021 18:08

thomasspriggs force-pushed the tas/smt_data_structures branch 3 times, most recently from 6f19964 to a8a94b6 Compare July 13, 2021 15:19

TGWDB reviewed Jul 14, 2021

View reviewed changes

thomasspriggs added 7 commits July 19, 2021 14:58

Add foundational smt sorts

596a0bd

Add smt sort downcast visitor

6990a69

Add operator== to smt_sortt

5a81e6b

Add smt_sortt::storer as a way add storage of sorts

03534cb

So that instances of `smt_sortt` can be stored in other `irept` derived classes without enabling upcasting of `smt_sortt` to `irept` from outside of the heirarchy of `irept` based classes.

Add smt_terms data structures

2e1744f

thomasspriggs added 5 commits July 19, 2021 14:58

Add smt_termt visitor

fbffc09

Because this will be used to implement term to string/output stream conversion. Use of a visitor will allow this printing code to be separated from the data structure.

Add controlled upcasting of smt_termt for storage implementions

abff630

So that instaces of smt_termt can be stored inside other non-term ireps.

Add smt commands

fabc43a

Add smt_command downcast visitor

c9a3de1

Add class diagram for internal SMT AST classes

8bc61d6

This class diagram served as the notes for implementing the initial set of classes. It is committed to the repository for future reference.

thomasspriggs force-pushed the tas/smt_data_structures branch from a8a94b6 to 8bc61d6 Compare July 19, 2021 13:59

thomasspriggs changed the title ~~[draft] Initial data structures for incremental smt2 solving~~ Initial data structures for incremental smt2 solving Jul 19, 2021

thomasspriggs marked this pull request as ready for review July 19, 2021 14:02

thomasspriggs requested review from chrisr-diffblue, kroening and peterschrammel as code owners July 19, 2021 14:02

TGWDB approved these changes Jul 20, 2021

View reviewed changes

chrisr-diffblue approved these changes Jul 20, 2021

View reviewed changes

thomasspriggs merged commit 03213c1 into diffblue:develop Jul 20, 2021

thomasspriggs deleted the tas/smt_data_structures branch July 20, 2021 14:28


		class smt_sort_const_downcast_visitort;

		class smt_sortt : protected irept

		smt_logic_quantifier_free_bit_vectorst::smt_logic_quantifier_free_bit_vectorst()
		: smt_logict{ID_smt_logic_quantifier_free_bit_vectors}

Initial data structures for incremental smt2 solving #6165

Initial data structures for incremental smt2 solving #6165

Uh oh!

Conversation

thomasspriggs commented Jun 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TGWDB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasspriggs commented Jun 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TGWDB commented Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martin-cs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasspriggs Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NlightNFotis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NlightNFotis Jun 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasspriggs commented Jun 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasspriggs commented Jun 10, 2021 •

edited

Loading

codecov bot commented Jun 10, 2021 •

edited

Loading

thomasspriggs commented Jun 10, 2021 •

edited

Loading

TGWDB commented Jun 11, 2021 •

edited

Loading

thomasspriggs Jun 11, 2021 •

edited

Loading

NlightNFotis Jun 16, 2021 •

edited

Loading