CBMC additional profiling info: Extend solver hardness to track clauses mapped to an instruction #5480

natasha-jeppu · 2020-08-26T21:53:17Z

Currently solver hardness tracks number of clauses. This is an extension of solver hardness to track the clauses themselves. Provides a mapping between clauses and instructions.

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2020-08-27T13:52:10Z

Codecov Report

Merging #5480 (2cb743f) into develop (7c016a7) will increase coverage by 0.01%.
The diff coverage is 82.35%.

@@             Coverage Diff             @@
##           develop    #5480      +/-   ##
===========================================
+ Coverage    69.33%   69.35%   +0.01%     
===========================================
  Files         1242     1242              
  Lines       100417   100433      +16     
===========================================
+ Hits         69622    69653      +31     
+ Misses       30795    30780      -15

Flag	Coverage Δ
cproversmt2	`43.12% <17.64%> (+0.02%)`	⬆️
regression	`66.25% <82.35%> (+0.02%)`	⬆️
unit	`32.30% <17.64%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/solvers/solver_hardness.h	`100.00% <ø> (ø)`
src/solvers/solver_hardness.cpp	`52.25% <72.72%> (+0.86%)`	⬆️
src/solvers/sat/satcheck_minisat2.cpp	`70.83% <100.00%> (+1.26%)`	⬆️
src/solvers/sat/cnf.cpp	`83.25% <0.00%> (+9.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c016a7...2cb743f. Read the comment docs.

danielsn · 2020-08-27T14:56:49Z

src/solvers/solver_hardness.cpp

@@ -91,10 +93,20 @@ void solver_hardnesst::register_clause(const bvt &bv)
 {
  current_hardness.clauses++;
  current_hardness.literals += bv.size();
+  std::vector<int> clause;


int? or do we want a type with a defined length like int_64?

❓ What are the memory and performance implications of storing a copy of the clauses?

I haven't explored this yet. Although a much more efficient approach would be to store some form of index (specifically an index interval) to some existing clause storage, but I am not sure if this can be done.

What are the metrics that you want to compute based on the clauses?

To identify which program lines contribute to the SAT core. So given this infrastructure we can map clauses from the SAT core directly to LOC.

There are tools which compute the UNSAT core given a CNF formula and DRAT proof of unsatisfiability (these proofs have become a standard in the SAT competitions). CBMC is used to get the dimacs CNF formula (--dimacs) and the clause to LOC mapping with --write-solver-stats-to option.

If this can only be used wiith dimacs couldn't you just store the index (eg. line of the clause) in the dimacs output?

How do I access the dimacs clause index here? Also is the dimacs output ordered?
I will need to map lines of program code to indices in the dimacs output.

You can make dimacs_cnft (or probably cnf_clause_listt) implement hardness_collectort(in the same way other sat solvers implement that interface). It seems that the clauses are stored in order in cnf_clause_listt, so dimacs_cnft will print them in the same order as you get the register_clause calls. Of course, you'll have to test and confirm that.

I now keep track of the clause counter in satcheck_minisat2_baset<T>::lcnf that calls register_clause and store the counter value rather than the clause itself in solver hardness. Have added optional DEBUG statements to verify if these match corresponding lines in the dimacs output.

danielsn · 2020-08-27T14:57:37Z

src/solvers/solver_hardness.cpp

  }
+  clause.push_back(0);
+  std::sort(clause.begin(), clause.end());


Do we want to sort before or after we add the 0?

Post processing of the SAT core cnf runs sort for the entire clause, including the zero. So might as well sort it here too.

danielsn · 2020-08-27T14:57:56Z

src/solvers/solver_hardness.cpp

+
+    int signed_literal = literal.var_no();
+    if(literal.sign())
+      signed_literal = -signed_literal;


Any chance this could overflow?

This is what is used inside CBMC already for the literal.dimacs() function. I'm assuming here that they have already taken into consideration any potential overflow issues. And also, they define the return type as int.

danielsn · 2020-08-27T14:58:36Z

src/solvers/solver_hardness.cpp

+      {
+        json_arrayt clause_json;
+        for(auto const &lit : clause)
+          clause_json.push_back(json_numbert{std::to_string(lit)});


json_numbert takes a string?

Yes, takes a string but the output is formatted json number.

danielsn · 2020-08-27T14:59:33Z

src/solvers/solver_hardness.cpp

@@ -26,6 +26,8 @@ operator+=(const solver_hardnesst::sat_hardnesst &other)
  clauses += other.clauses;
  literals += other.literals;
  variables.insert(other.variables.begin(), other.variables.end());
+  clause_set.insert(


Should we sort this?

I was worried about too many calls to sort().

danielsn · 2020-08-27T14:59:55Z

src/solvers/solver_hardness.cpp

@@ -91,10 +93,20 @@ void solver_hardnesst::register_clause(const bvt &bv)
 {
  current_hardness.clauses++;
  current_hardness.literals += bv.size();
+  std::vector<int> clause;


Should we typedef this?

natasha-jeppu · 2020-09-03T12:31:05Z

src/solvers/sat/satcheck_minisat2.cpp

+    with_solver_hardness([&bv, &solver_clause_num](solver_hardnesst &hardness) {
+      hardness.register_clause(bv, solver_clause_num);


Pass the clause counter to hardness.register_clause(). Store the clause counter value rather than the clause themselves.

natasha-jeppu · 2020-09-03T12:33:09Z

src/solvers/solver_hardness.cpp

+#ifdef DEBUG
+  std::cout << solver_clause_num << ": ";
+  for(const auto &literal : bv)
+    std::cout << literal.dimacs() << " ";
+  std::cout << "0\n";
+#endif


Debug statements to output 'clause_counter_value: clause' to cross check if they match the corresponding lines in dimacs output.

peterschrammel · 2020-09-03T15:25:52Z

src/solvers/sat/satcheck_minisat2.cpp

@@ -140,8 +140,10 @@ void satcheck_minisat2_baset<T>::lcnf(const bvt &bv)

    solver->addClause_(c);

-    with_solver_hardness(
-      [&bv](solver_hardnesst &hardness) { hardness.register_clause(bv); });
+    size_t solver_clause_num = clause_counter;


Assignment seems redundant, just use clause_counter in line 145.

clause_counter is an inherited data member of satcheck_minisat2_baset<T> from cnf_solvert so I cannot pass it directly to the lambda capture list (not in scope). Will need to pass this instead. Thought it would be better to just pass a copy.

peterschrammel · 2020-09-03T15:26:47Z

src/solvers/solver_hardness.h

@@ -50,6 +50,7 @@ struct solver_hardnesst
    size_t clauses = 0;
    size_t literals = 0;
    std::unordered_set<size_t> variables = {};
+    std::vector<int> clause_set = {};


Suggested change

std::vector<int> clause_set = {};

std::vector<size_t> clause_set = {};

peterschrammel · 2020-09-03T15:28:05Z

src/solvers/solver_hardness.cpp

+  std::cout << "0\n";
+#endif
+
+  current_hardness.clause_set.push_back(solver_clause_num + 1);


Please add a comment to explain why +1.

martin-cs

I have done quite a bit of performance optimisation work on CBMC. If you want a chat about it then drop me a line.

martin-cs · 2020-09-10T20:34:34Z

src/solvers/sat/satcheck_minisat2.cpp

@@ -140,8 +140,22 @@ void satcheck_minisat2_baset<T>::lcnf(const bvt &bv)

    solver->addClause_(c);

+    // To map clauses to lines of program code, track clause indices in the


I don't want to be a downer on this but ...

There is not necessarily a 1-to-1 correlations between clauses and lines of code. For example, rewriting and normalisation may mean that multiple lines reduce to the same shared term. Which line should this be credited to? This is worse when it is things like multiplication where additions from other operations can appear as sub-parts of the operator.

Number of variables and clauses is a very crude measure of SAT difficulty. Reducing the number of clauses can make the problem harder; I have seen multiple instances of this.

martin-cs · 2020-09-10T20:36:51Z

src/solvers/solver_hardness.h

-  void register_clause(const bvt &bv);
+  /// \param cnf: processed clause
+  /// \param cnf_clause_index: index of clause in dimacs output
+  /// \param register_cnf: negation of boolean variable tracking if the clause


Please clarify this comment.

martin-cs · 2020-09-10T20:38:22Z

src/solvers/solver_hardness.h

@@ -50,6 +50,7 @@ struct solver_hardnesst
    size_t clauses = 0;
    size_t literals = 0;
    std::unordered_set<size_t> variables = {};
+    std::vector<size_t> clause_set = {};


What is the performance cost of this?

…just number of clauses Currently, solver hardness tracks number of clauses and not the actual clauses for an instruction. This mapping will help identfy code hotspots based on which of the clauses belong to the SAT core. This commit extends the sat_hardnesst structure to include mapping between clauses and ssa expression.

…put for write-solver-stats-to Clauses are stored as a sorted vector of integers.

…put for write-solver-stats-to Clauses are stored as a sorted vector of integers. clang format solver_hardness.cpp

…he clause itself

…he clause itself Add comment to explain solver_clause_num + 1 in solver hardness, change <int> to <size_t>

…he clause itself Add comment to explain solver_clause_num + 1 in solver hardness, change <int> to <size_t> Remove sorting of clause set keeping performance in mind

…he clause itself Add comment to explain solver_clause_num + 1 in solver hardness, change <int> to <size_t> Remove sorting of clause set keeping performance in mind Add additional utility to map program lines to clauses in --dimacs output clang format

In diffblue#5480 the solver hardness interface was updated, but changes were only implemented in the Minisat2 interface. Update the Glucose interface to match these changes. To avoid future regressions, make the check-ubuntu-20_04-cmake-gcc GitHub action use Glucose.

In diffblue#5480 the solver hardness interface was updated, but changes were only implemented in the Minisat2 interface. Update the IPASIR interface to match these changes. To avoid future regressions, make the check-macos-10_15-make-clang GitHub action use IPASIR with Riss as the back-end solver.

In diffblue#5480 the solver hardness interface was updated, but changes were only implemented in the Minisat2 interface. Update the Glucose interface to match these changes. To avoid future regressions, make the check-ubuntu-20_04-cmake-gcc GitHub action use Glucose.