-
Notifications
You must be signed in to change notification settings - Fork 274
Initial data structures for incremental smt2 solving #6165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial data structures for incremental smt2 solving #6165
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #6165 +/- ##
===========================================
+ Coverage 75.47% 75.49% +0.02%
===========================================
Files 1459 1474 +15
Lines 161447 161839 +392
===========================================
+ Hits 121852 122182 +330
- Misses 39595 39657 +62
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also no unit tests for the smt_commands?
Its on the TODO list in the PR description. |
9409d99
to
e0c2984
Compare
For bookkeeping, note that this is part of the SMT backend work with broad explanation and RFC in #6134 |
e0c2984
to
f9a2626
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I must be missing / not understanding a requirement here. The commands; yes, they are good and would be useful. But the rest -- why not just use exprt
and typet
? The underlying representation is the same and there are already a load of utility functions. Also, in the vast majority of cases, the semantics are the same as well. I can understand that it might be nice to use the C++ type system to mark which are values used in this subsystem or having had some conversions / checks applied but that can be done in a much more lightweight fashion.
|
||
class smt_sort_const_downcast_visitort; | ||
|
||
class smt_sortt : protected irept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why define these separately when you could use typet
? The type system is essentially identical. I can see a sort of value in having a class smt_sortt : public typet
and a smt_sortt to_smt_sort(typet &t)
function so you can use the C++ type safety to enforce conversion but this really seems like overkill.
#include <solvers/smt2_incremental/smt_terms.h> | ||
#include <util/irep.h> | ||
|
||
class smt_commandt : protected irept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a sanity test for this, would it be worth integrating these into smt2_solver
/ smt2_parser
? They should make the separation between parsing and solving much clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Using the new structures for the solving will help sanity test them. I have already found several minor issues by working on solving/converting to string locally. However I feel it is better to get a first version merged and then refine the structures as we work on the solving, rather than perfect the first version before merging the first PR. Integrating the new structures into the parsing would be a nice-to-have. But we are currently thinking of working on exprt -> new structures and new structures -> string before parsing.
|
||
/// \brief Class for adding the ability to up and down cast smt_sortt to and | ||
/// from irept. These casts are required by other irept derived classes in | ||
/// order to store instances of smt_sortt inside them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just publicly inherit from irept
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I want to restrict the ability to cast to and from irept
, to the inside of the storage classes. I see the fact that each of these inherit from irept
to be an implementation detail. Ideally, there should be no need to upcast to irept
as part of the algorithms which live outside the storage classes. Keeping variables declared as smt_termt
or smt_sortt
rather than irept
where possible will give us more specific compile time type information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't your rationale then suggest using an irept
member instead of inheriting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't your rationale then suggest using an
irept
member instead of inheriting?
Yes, I could have done that. I could also have implemented this without using irept
at all and handled lifetime using std::unique_ptr
or some such. By using irept
with inheritance this code retains a greater semblance to other parts of the code base. This should aid cbmc developers familiar with other parts of the code base to transition to maintaining or improving this part and it should aid with reusing the existing pretty
, operator==
and hashing algorithms defined for irept
. Otherwise we would need to write more new code to define these pieces of functionality for the new data structures.
@@ -0,0 +1,2 @@ | |||
SORT_ID(bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't see the value of using this over ID_bool
used in bool_typet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please bear in mind that this file is not only being used to define the IDs of the smt_sortt
classes. It is also being used to automate writing some of the more repetitive parts of the visitor pattern. 🤔 I should probably add some doxygen to this file explaining how it is used. What would be the value in rationing the number of different IDs used rather than keeping them separate?
class smt_sortt; | ||
class smt_term_const_downcast_visitort; | ||
|
||
class smt_termt : protected irept, private smt_sortt::storert<smt_termt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just user exprt
or if you want to do some kind of conversion, then have a simple wrapper around exprt
that marks the converted values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this gives us more compile-time checking of the soundness of the conversion and better guarantees about the separation of concerns. This applies both in terms of the conversion from exprt
to smt_termt
and the conversion from smt_termt
to string or other representation understood by the solver. We are only currently planning a conversion to SMT2 strings, but by keeping the conversion to strings separate it would theoretically allow for more easily adding an additional conversion algorithm such as to smt-switch
data structures to be added on top as an additional run time option. Without the new data structures, the conversion from exprt
to smt_termt
could "leak" unconverted exprt
classes which can only be caught at run time. The way the visitor is defined will also mean that missing conversions from smt_termt
to string would also cause a compile error. There has to be at least a placeholder method in place with the UNIMPLEMENTED
macro.
Defining the data structure will give us a clean separation between the conversion from exprt
to smt_termt
and smt_termt
to string. It will allow for separately unit testing the two conversions. A new data structure only for use with smt2
has the same advantages that having separate data structures from the language front ends such as for Java
and C
vs the internal goto_program
representation would have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can achieve much of the compile time checking at a much lower cost of having something like...
class smt_sortt : public typet {};
class smt_termt : public exprtt {};
class smt_translator {
public :
virtual smt_sortt type_to_sort(const typet &t) = 0;
virtual smt_termt expr_to_term(const exprt &e) = 0;
}
If this is the only way of producing these two types and the rest of the code is const
-clean and the critical functions take smt_sortt
and smt_termt
then it will be very rare to have failed-to-convert bug.
I would suggest an object because these transformations tend to wind up being stateful no matter how good everyone's intent is. They will likely also need to be stackable
in some sense. As soon as you have a couple of options on how to translate each theory / type (IIRC this was on your list of goals?) then you will have a combinatorial explosion of normal forms, so it is not so easy to use types* to enforce normal forms.
Using the visitor to do the actual conversion or caching converted things is smart idea and it is good to see it early integrated into the design here.
*. I could be wrong, if I am I would love to know because the handling of goto-programs has exactly this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not in favour of publicly inheriting from typet
, for the following reasons -
- It would support down casting from
typet
tosmt_sortt
and require checks to be written for these casts. - It would add public methods to
smt_sortt
such assubtype()
andsource_location()
which don't necessarily make sense in the context of a sort. - It would add public access to member functions from
irept
, which would allow direct manipulation of theirept
data structures through weakly typed member functions such asmove_to_sub
. - It would add public access to
irept
'sis_nil
member function, whereas I would prefer to make the classes in this hierarchy non-nullable/nillable. That wayoptionalt
can be added on top of these classes only in the places where nullability is a desirable property.
On the point of the translator - Yes, we will need functionality to convert from exprt
to termt
. The choice of whether to use member functions or free functions with configuration/state struct parameters to do the conversions is mostly a stylistic choice in my opinion, rather than a crucial difference. This is unless we really need virtual inheritance and multiple different translators. I have found free functions to be a little easier to unit test in some circumstances, which is a point in their favour. I do think it would improve readability to store configuration and state separately assuming we end up needing both though. That way the configuration can be kept const and the state mutable.
f9a2626
to
ec38e7f
Compare
213fecb
to
f73547c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions from me.
{ | ||
const smt_bit_vector_sortt bit_vec_sort{32}; | ||
smt_sortt sort{bit_vec_sort}; | ||
sort = bit_vec_sort; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these two test cases testing for exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tests that the copy constructor of the smt_sortt
class can be called with a smt_bit_vector_sortt typed parameter and the same for the assignment operator. Effectively testing the upcast which is carried out in these two operations is accessible. I added these two tests because the same test with the irept
constructor should fail due to the access restrictions. Writing a test that it does not work with irept
, could be done using some template meta programming to write a "does not compile" function, but probably would not be straight forward code to for us to read or write.
|
||
TEST_CASE("Visiting smt_bool_sortt.", "[core][smt2_incremental]") | ||
{ | ||
class : public smt_sort_const_downcast_visitort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ Given that the visitor is nearly the same in both of the following test cases, does it make sense for it to be pulled out somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I usually err on the side of more duplication in unit tests, in order to make it more obvious why the properties the test is testing should hold. The duplication isn't too bad for the sorts at the moment as there are only a couple of cases. But it will be much worse for the terms and commands and it will be worse if we expand this with more sorts. Therefore I'll come up with a way to factor it out for the terms and commands and then back port the solution to the sorts.
|
||
TEST_CASE("Test smt_sortt to string conversion", "[core][smt2_incremental]") | ||
{ | ||
CHECK(smt_to_string(smt_bool_sortt{}) == "Bool"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 I really like how clean these are because of the visitor - I feel like it was a good design choice.
@@ -10,6 +10,16 @@ | |||
#include <solvers/smt2_incremental/smt_sorts.def> | |||
#undef SORT_ID | |||
|
|||
bool smt_sortt::operator==(const smt_sortt &other) const | |||
{ | |||
return irept::operator==(other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: since this just calls the base class' method, with no other change to it, does it make sense to kill this method here and just using irept::operator==;
in the header file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The irept::operator==
member function takes an irept
as its parameter / right hand side. So yes we could add using irept::operator==;
, but the existing tests which use the smt_sortt::operator==
would not compile because the implicit upcast to irept
is banned from outside of the data structure's member functions due to the use of protected inheritance. Also I don't think it would make sense to allow comparisons such as smt_bool_sortt{} == irept{}
. 1) because sorts should not be stored as irept
outside of the irep based data structures and 2) because if we know that one side is not a sort at compile time and that the other side is a sort, then we know that the result would always be false.
a5b5674
to
f313f6e
Compare
Note: Some of the make build jobs are currently failing CI because the makefile miscounts the tests which are using TEMPLATE_TEST_CASE |
6f19964
to
a8a94b6
Compare
@startuml | ||
' This diagram is written using PlantUML. It can be visualised locally or using | ||
' the PlantUML website - http://www.plantuml.com/plantuml/uml/. | ||
' On Linux the visualisation tool can be installed using - | ||
' `sudo apt-get install plantuml` | ||
' Running the `plantuml` binary from the directory where this file exists will | ||
' give the option to view the file. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is great to have (and also to be able to generate). I just have the general concern that this is not generated or linked from the source code? How easy will it be for the two to lose synchronisation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a doxygen plugin which can generate a graph. However I remember these being less than ideal. I suggest we either keep the plantUML and live with the reality that it will likely get out of date, or I remove it entirely and put any useful information from the graph nodes into the doxygen above the corresponding classes.
smt_logic_quantifier_free_bit_vectorst::smt_logic_quantifier_free_bit_vectorst() | ||
: smt_logict{ID_smt_logic_quantifier_free_bit_vectors} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some tension here since the code here (and later also marked) uses quantifier_free_bit_vectors
but I think the actual logic is likely to be QF_AUFBV
(based on what is used now). Note that this is not QF_BV
which corresponds to quantifier free bit vector theory. (QF_AUFBV
also allows arrays and free sort and function symbols, more details on logics here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. quantifier_free_bit_vectors
would be just QF_BV
. QF_AUFBV
is what we were using in the old back end up until recently. I should fix the full names to match the abbreviations which get written out to the solver.
/// * The constant `irep_idt`s used to identify each of the logic classes. | ||
/// * The member functions of the `smt_logic_const_downcast_visitort` class. | ||
/// * The type of option checks required to implement `smt_logict::accept`. | ||
LOGIC_ID(quantifier_free_bit_vectors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic tension
class smt_logic_quantifier_free_bit_vectorst : public smt_logict | ||
{ | ||
public: | ||
smt_logic_quantifier_free_bit_vectorst(); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic tension
term = smt_bool_literal_termt{false}; | ||
} | ||
|
||
TEST_CASE("Test smt_termt.pretty is accessible.", "[core][smt2_incremental]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this is testing both that the pretty
function is accessible and the output. I guess a change to the base pretty
function could invalidate this test without any changes to the smt_termt
code.
CHECK_THROWS(smt_identifier_termt{"foo\\ bar", smt_bool_sortt{}}); | ||
} | ||
|
||
TEST_CASE("smt_identifier_termt upcasts.", "[core][smt2_incremental]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No actual CHECK
inside this test case, I guess it's a style choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wrote these upcast tests when I was initially working on the accessibility, at that time I just wanted to be sure that the upcasts compiled. However now that the rest of the tests are in place and full of upcasts, I am not sure that this has value. Therefore I might just remove all the upcast tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now removed these tests, as I was unconvinced of the value they added.
So that instances of `smt_sortt` can be stored in other `irept` derived classes without enabling upcasting of `smt_sortt` to `irept` from outside of the heirarchy of `irept` based classes.
So that it can be debugged and further features added separately from the `makefile`. Note that the check for `[.]` was already broken in the makefile because `[.]` is inserted using the XFAIL macro, but the files are grepped before macro preprocessing.
So that we can use catch's `TEMPLATE_TEST_CASE` without the test count failing when using make to run the tests. When using `TEMPLATE_TEST_CASE` the number of tests added to catch's count of test depends on the number of type arguments supplied. Therefore this update is needed to count the arguments to get the correct count.
Because this will be used to implement term to string/output stream conversion. Use of a visitor will allow this printing code to be separated from the data structure.
So that instaces of smt_termt can be stored inside other non-term ireps.
This class diagram served as the notes for implementing the initial set of classes. It is committed to the repository for future reference.
a8a94b6
to
8bc61d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible to me.
Initial data structures for implementation of incremental smt2 solving.
Standard PR checklist: