Skip to content

Initial data structures for incremental smt2 solving #6165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 20, 2021

Conversation

thomasspriggs
Copy link
Contributor

@thomasspriggs thomasspriggs commented Jun 10, 2021

Initial data structures for implementation of incremental smt2 solving.

Standard PR checklist:

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #6165 (8bc61d6) into develop (db1a8b0) will increase coverage by 0.02%.
The diff coverage is 84.10%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #6165      +/-   ##
===========================================
+ Coverage    75.47%   75.49%   +0.02%     
===========================================
  Files         1459     1474      +15     
  Lines       161447   161839     +392     
===========================================
+ Hits        121852   122182     +330     
- Misses       39595    39657      +62     
Impacted Files Coverage Δ
src/solvers/smt2_incremental/smt_commands.def 0.00% <0.00%> (ø)
src/solvers/smt2_incremental/smt_logics.h 0.00% <0.00%> (ø)
src/solvers/smt2_incremental/smt_options.def 0.00% <0.00%> (ø)
src/solvers/smt2_incremental/smt_options.cpp 43.75% <43.75%> (ø)
src/solvers/smt2_incremental/smt_commands.cpp 80.51% <80.51%> (ø)
src/solvers/smt2_incremental/smt_sorts.cpp 81.25% <81.25%> (ø)
src/solvers/smt2_incremental/smt_terms.cpp 85.00% <85.00%> (ø)
unit/solvers/smt2_incremental/smt_sorts.cpp 88.88% <88.88%> (ø)
unit/solvers/smt2_incremental/smt_terms.cpp 91.78% <91.78%> (ø)
src/solvers/smt2_incremental/smt_commands.h 100.00% <100.00%> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06c563a...8bc61d6. Read the comment docs.

Copy link
Contributor

@TGWDB TGWDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also no unit tests for the smt_commands?

@thomasspriggs
Copy link
Contributor Author

thomasspriggs commented Jun 10, 2021

Also no unit tests for the smt_commands?

Its on the TODO list in the PR description.

@thomasspriggs thomasspriggs force-pushed the tas/smt_data_structures branch 3 times, most recently from 9409d99 to e0c2984 Compare June 10, 2021 18:42
@TGWDB
Copy link
Contributor

TGWDB commented Jun 11, 2021

For bookkeeping, note that this is part of the SMT backend work with broad explanation and RFC in #6134

@thomasspriggs thomasspriggs force-pushed the tas/smt_data_structures branch from e0c2984 to f9a2626 Compare June 11, 2021 09:37
Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I must be missing / not understanding a requirement here. The commands; yes, they are good and would be useful. But the rest -- why not just use exprt and typet? The underlying representation is the same and there are already a load of utility functions. Also, in the vast majority of cases, the semantics are the same as well. I can understand that it might be nice to use the C++ type system to mark which are values used in this subsystem or having had some conversions / checks applied but that can be done in a much more lightweight fashion.


class smt_sort_const_downcast_visitort;

class smt_sortt : protected irept
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define these separately when you could use typet? The type system is essentially identical. I can see a sort of value in having a class smt_sortt : public typet and a smt_sortt to_smt_sort(typet &t) function so you can use the C++ type safety to enforce conversion but this really seems like overkill.

#include <solvers/smt2_incremental/smt_terms.h>
#include <util/irep.h>

class smt_commandt : protected irept
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a sanity test for this, would it be worth integrating these into smt2_solver / smt2_parser? They should make the separation between parsing and solving much clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Using the new structures for the solving will help sanity test them. I have already found several minor issues by working on solving/converting to string locally. However I feel it is better to get a first version merged and then refine the structures as we work on the solving, rather than perfect the first version before merging the first PR. Integrating the new structures into the parsing would be a nice-to-have. But we are currently thinking of working on exprt -> new structures and new structures -> string before parsing.


/// \brief Class for adding the ability to up and down cast smt_sortt to and
/// from irept. These casts are required by other irept derived classes in
/// order to store instances of smt_sortt inside them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just publicly inherit from irept?

Copy link
Contributor Author

@thomasspriggs thomasspriggs Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I want to restrict the ability to cast to and from irept, to the inside of the storage classes. I see the fact that each of these inherit from irept to be an implementation detail. Ideally, there should be no need to upcast to irept as part of the algorithms which live outside the storage classes. Keeping variables declared as smt_termt or smt_sortt rather than irept where possible will give us more specific compile time type information.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't your rationale then suggest using an irept member instead of inheriting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't your rationale then suggest using an irept member instead of inheriting?

Yes, I could have done that. I could also have implemented this without using irept at all and handled lifetime using std::unique_ptr or some such. By using irept with inheritance this code retains a greater semblance to other parts of the code base. This should aid cbmc developers familiar with other parts of the code base to transition to maintaining or improving this part and it should aid with reusing the existing pretty, operator== and hashing algorithms defined for irept. Otherwise we would need to write more new code to define these pieces of functionality for the new data structures.

@@ -0,0 +1,2 @@
SORT_ID(bool)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't see the value of using this over ID_bool used in bool_typet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please bear in mind that this file is not only being used to define the IDs of the smt_sortt classes. It is also being used to automate writing some of the more repetitive parts of the visitor pattern. 🤔 I should probably add some doxygen to this file explaining how it is used. What would be the value in rationing the number of different IDs used rather than keeping them separate?

class smt_sortt;
class smt_term_const_downcast_visitort;

class smt_termt : protected irept, private smt_sortt::storert<smt_termt>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just user exprt or if you want to do some kind of conversion, then have a simple wrapper around exprt that marks the converted values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this gives us more compile-time checking of the soundness of the conversion and better guarantees about the separation of concerns. This applies both in terms of the conversion from exprt to smt_termt and the conversion from smt_termt to string or other representation understood by the solver. We are only currently planning a conversion to SMT2 strings, but by keeping the conversion to strings separate it would theoretically allow for more easily adding an additional conversion algorithm such as to smt-switch data structures to be added on top as an additional run time option. Without the new data structures, the conversion from exprt to smt_termt could "leak" unconverted exprt classes which can only be caught at run time. The way the visitor is defined will also mean that missing conversions from smt_termt to string would also cause a compile error. There has to be at least a placeholder method in place with the UNIMPLEMENTED macro.

Defining the data structure will give us a clean separation between the conversion from exprt to smt_termt and smt_termt to string. It will allow for separately unit testing the two conversions. A new data structure only for use with smt2 has the same advantages that having separate data structures from the language front ends such as for Java and C vs the internal goto_program representation would have.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can achieve much of the compile time checking at a much lower cost of having something like...

class smt_sortt : public typet {};
class smt_termt : public exprtt {};

class smt_translator {
public :
  virtual smt_sortt type_to_sort(const typet &t) = 0;
  virtual smt_termt expr_to_term(const exprt &e) = 0;
}

If this is the only way of producing these two types and the rest of the code is const-clean and the critical functions take smt_sortt and smt_termt then it will be very rare to have failed-to-convert bug.

I would suggest an object because these transformations tend to wind up being stateful no matter how good everyone's intent is. They will likely also need to be stackable in some sense. As soon as you have a couple of options on how to translate each theory / type (IIRC this was on your list of goals?) then you will have a combinatorial explosion of normal forms, so it is not so easy to use types* to enforce normal forms.

Using the visitor to do the actual conversion or caching converted things is smart idea and it is good to see it early integrated into the design here.

*. I could be wrong, if I am I would love to know because the handling of goto-programs has exactly this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in favour of publicly inheriting from typet, for the following reasons -

  • It would support down casting from typet to smt_sortt and require checks to be written for these casts.
  • It would add public methods to smt_sortt such as subtype() and source_location() which don't necessarily make sense in the context of a sort.
  • It would add public access to member functions from irept, which would allow direct manipulation of the irept data structures through weakly typed member functions such as move_to_sub.
  • It would add public access to irept's is_nil member function, whereas I would prefer to make the classes in this hierarchy non-nullable/nillable. That way optionalt can be added on top of these classes only in the places where nullability is a desirable property.

On the point of the translator - Yes, we will need functionality to convert from exprt to termt. The choice of whether to use member functions or free functions with configuration/state struct parameters to do the conversions is mostly a stylistic choice in my opinion, rather than a crucial difference. This is unless we really need virtual inheritance and multiple different translators. I have found free functions to be a little easier to unit test in some circumstances, which is a point in their favour. I do think it would improve readability to store configuration and state separately assuming we end up needing both though. That way the configuration can be kept const and the state mutable.

@thomasspriggs thomasspriggs force-pushed the tas/smt_data_structures branch from f9a2626 to ec38e7f Compare June 11, 2021 12:31
@thomasspriggs thomasspriggs self-assigned this Jun 11, 2021
@thomasspriggs thomasspriggs force-pushed the tas/smt_data_structures branch 2 times, most recently from 213fecb to f73547c Compare June 16, 2021 09:17
Copy link
Contributor

@NlightNFotis NlightNFotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions from me.

{
const smt_bit_vector_sortt bit_vec_sort{32};
smt_sortt sort{bit_vec_sort};
sort = bit_vec_sort;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these two test cases testing for exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It tests that the copy constructor of the smt_sortt class can be called with a smt_bit_vector_sortt typed parameter and the same for the assignment operator. Effectively testing the upcast which is carried out in these two operations is accessible. I added these two tests because the same test with the irept constructor should fail due to the access restrictions. Writing a test that it does not work with irept, could be done using some template meta programming to write a "does not compile" function, but probably would not be straight forward code to for us to read or write.


TEST_CASE("Visiting smt_bool_sortt.", "[core][smt2_incremental]")
{
class : public smt_sort_const_downcast_visitort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ Given that the visitor is nearly the same in both of the following test cases, does it make sense for it to be pulled out somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I usually err on the side of more duplication in unit tests, in order to make it more obvious why the properties the test is testing should hold. The duplication isn't too bad for the sorts at the moment as there are only a couple of cases. But it will be much worse for the terms and commands and it will be worse if we expand this with more sorts. Therefore I'll come up with a way to factor it out for the terms and commands and then back port the solution to the sorts.


TEST_CASE("Test smt_sortt to string conversion", "[core][smt2_incremental]")
{
CHECK(smt_to_string(smt_bool_sortt{}) == "Bool");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 I really like how clean these are because of the visitor - I feel like it was a good design choice.

@@ -10,6 +10,16 @@
#include <solvers/smt2_incremental/smt_sorts.def>
#undef SORT_ID

bool smt_sortt::operator==(const smt_sortt &other) const
{
return irept::operator==(other);
Copy link
Contributor

@NlightNFotis NlightNFotis Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: since this just calls the base class' method, with no other change to it, does it make sense to kill this method here and just using irept::operator==; in the header file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The irept::operator== member function takes an irept as its parameter / right hand side. So yes we could add using irept::operator==;, but the existing tests which use the smt_sortt::operator== would not compile because the implicit upcast to irept is banned from outside of the data structure's member functions due to the use of protected inheritance. Also I don't think it would make sense to allow comparisons such as smt_bool_sortt{} == irept{}. 1) because sorts should not be stored as irept outside of the irep based data structures and 2) because if we know that one side is not a sort at compile time and that the other side is a sort, then we know that the result would always be false.

@thomasspriggs thomasspriggs force-pushed the tas/smt_data_structures branch from a5b5674 to f313f6e Compare June 16, 2021 18:08
@thomasspriggs
Copy link
Contributor Author

Note: Some of the make build jobs are currently failing CI because the makefile miscounts the tests which are using TEMPLATE_TEST_CASE

@thomasspriggs thomasspriggs force-pushed the tas/smt_data_structures branch 3 times, most recently from 6f19964 to a8a94b6 Compare July 13, 2021 15:19
Comment on lines +1 to +8
@startuml
' This diagram is written using PlantUML. It can be visualised locally or using
' the PlantUML website - http://www.plantuml.com/plantuml/uml/.
' On Linux the visualisation tool can be installed using -
' `sudo apt-get install plantuml`
' Running the `plantuml` binary from the directory where this file exists will
' give the option to view the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is great to have (and also to be able to generate). I just have the general concern that this is not generated or linked from the source code? How easy will it be for the two to lose synchronisation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a doxygen plugin which can generate a graph. However I remember these being less than ideal. I suggest we either keep the plantUML and live with the reality that it will likely get out of date, or I remove it entirely and put any useful information from the graph nodes into the doxygen above the corresponding classes.

Comment on lines +44 to +45
smt_logic_quantifier_free_bit_vectorst::smt_logic_quantifier_free_bit_vectorst()
: smt_logict{ID_smt_logic_quantifier_free_bit_vectors}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some tension here since the code here (and later also marked) uses quantifier_free_bit_vectors but I think the actual logic is likely to be QF_AUFBV (based on what is used now). Note that this is not QF_BV which corresponds to quantifier free bit vector theory. (QF_AUFBV also allows arrays and free sort and function symbols, more details on logics here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. quantifier_free_bit_vectors would be just QF_BV. QF_AUFBV is what we were using in the old back end up until recently. I should fix the full names to match the abbreviations which get written out to the solver.

/// * The constant `irep_idt`s used to identify each of the logic classes.
/// * The member functions of the `smt_logic_const_downcast_visitort` class.
/// * The type of option checks required to implement `smt_logict::accept`.
LOGIC_ID(quantifier_free_bit_vectors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic tension

Comment on lines +64 to +68
class smt_logic_quantifier_free_bit_vectorst : public smt_logict
{
public:
smt_logic_quantifier_free_bit_vectorst();
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic tension

term = smt_bool_literal_termt{false};
}

TEST_CASE("Test smt_termt.pretty is accessible.", "[core][smt2_incremental]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is testing both that the pretty function is accessible and the output. I guess a change to the base pretty function could invalidate this test without any changes to the smt_termt code.

CHECK_THROWS(smt_identifier_termt{"foo\\ bar", smt_bool_sortt{}});
}

TEST_CASE("smt_identifier_termt upcasts.", "[core][smt2_incremental]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actual CHECK inside this test case, I guess it's a style choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wrote these upcast tests when I was initially working on the accessibility, at that time I just wanted to be sure that the upcasts compiled. However now that the rest of the tests are in place and full of upcasts, I am not sure that this has value. Therefore I might just remove all the upcast tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now removed these tests, as I was unconvinced of the value they added.

So that instances of `smt_sortt` can be stored in other `irept` derived
classes without enabling upcasting of `smt_sortt` to `irept` from
outside of the heirarchy of `irept` based classes.
So that it can be debugged and further features added separately from
the `makefile`. Note that the check for `[.]` was already broken in the
makefile because `[.]` is inserted using the XFAIL macro, but the files
are grepped before macro preprocessing.
So that we can use catch's `TEMPLATE_TEST_CASE` without the test count
failing when using make to run the tests. When using
`TEMPLATE_TEST_CASE` the number of tests added to catch's count of test
depends on the number of type arguments supplied. Therefore this update
is needed to count the arguments  to get the correct count.
Because this will be used to implement term to string/output stream
conversion. Use of a visitor will allow this printing code to be
separated from the data structure.
So that instaces of smt_termt can be stored inside other non-term ireps.
This class diagram served as the notes for implementing the initial set
of classes. It is committed to the repository for future reference.
@thomasspriggs thomasspriggs force-pushed the tas/smt_data_structures branch from a8a94b6 to 8bc61d6 Compare July 19, 2021 13:59
@thomasspriggs thomasspriggs changed the title [draft] Initial data structures for incremental smt2 solving Initial data structures for incremental smt2 solving Jul 19, 2021
@thomasspriggs thomasspriggs marked this pull request as ready for review July 19, 2021 14:02
Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible to me.

@thomasspriggs thomasspriggs merged commit 03213c1 into diffblue:develop Jul 20, 2021
@thomasspriggs thomasspriggs deleted the tas/smt_data_structures branch July 20, 2021 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants