Skip to content

CBMC additional profiling info: Extend solver hardness to track clauses mapped to an instruction #5480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 19, 2020

Conversation

natasha-jeppu
Copy link

Currently solver hardness tracks number of clauses. This is an extension of solver hardness to track the clauses themselves. Provides a mapping between clauses and instructions.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@codecov
Copy link

codecov bot commented Aug 27, 2020

Codecov Report

Merging #5480 (2cb743f) into develop (7c016a7) will increase coverage by 0.01%.
The diff coverage is 82.35%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5480      +/-   ##
===========================================
+ Coverage    69.33%   69.35%   +0.01%     
===========================================
  Files         1242     1242              
  Lines       100417   100433      +16     
===========================================
+ Hits         69622    69653      +31     
+ Misses       30795    30780      -15     
Flag Coverage Δ
cproversmt2 43.12% <17.64%> (+0.02%) ⬆️
regression 66.25% <82.35%> (+0.02%) ⬆️
unit 32.30% <17.64%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/solvers/solver_hardness.h 100.00% <ø> (ø)
src/solvers/solver_hardness.cpp 52.25% <72.72%> (+0.86%) ⬆️
src/solvers/sat/satcheck_minisat2.cpp 70.83% <100.00%> (+1.26%) ⬆️
src/solvers/sat/cnf.cpp 83.25% <0.00%> (+9.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c016a7...2cb743f. Read the comment docs.

@danielsn danielsn added the aws Bugs or features of importance to AWS CBMC users label Aug 27, 2020
@@ -91,10 +93,20 @@ void solver_hardnesst::register_clause(const bvt &bv)
{
current_hardness.clauses++;
current_hardness.literals += bv.size();
std::vector<int> clause;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int? or do we want a type with a defined length like int_64?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ What are the memory and performance implications of storing a copy of the clauses?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't explored this yet. Although a much more efficient approach would be to store some form of index (specifically an index interval) to some existing clause storage, but I am not sure if this can be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the metrics that you want to compute based on the clauses?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To identify which program lines contribute to the SAT core. So given this infrastructure we can map clauses from the SAT core directly to LOC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tools which compute the UNSAT core given a CNF formula and DRAT proof of unsatisfiability (these proofs have become a standard in the SAT competitions). CBMC is used to get the dimacs CNF formula (--dimacs) and the clause to LOC mapping with --write-solver-stats-to option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this can only be used wiith dimacs couldn't you just store the index (eg. line of the clause) in the dimacs output?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I access the dimacs clause index here? Also is the dimacs output ordered?
I will need to map lines of program code to indices in the dimacs output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make dimacs_cnft (or probably cnf_clause_listt) implement hardness_collectort(in the same way other sat solvers implement that interface). It seems that the clauses are stored in order in cnf_clause_listt, so dimacs_cnft will print them in the same order as you get the register_clause calls. Of course, you'll have to test and confirm that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now keep track of the clause counter in satcheck_minisat2_baset<T>::lcnf that calls register_clause and store the counter value rather than the clause itself in solver hardness. Have added optional DEBUG statements to verify if these match corresponding lines in the dimacs output.

}
clause.push_back(0);
std::sort(clause.begin(), clause.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to sort before or after we add the 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post processing of the SAT core cnf runs sort for the entire clause, including the zero. So might as well sort it here too.


int signed_literal = literal.var_no();
if(literal.sign())
signed_literal = -signed_literal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance this could overflow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is used inside CBMC already for the literal.dimacs() function. I'm assuming here that they have already taken into consideration any potential overflow issues. And also, they define the return type as int.

{
json_arrayt clause_json;
for(auto const &lit : clause)
clause_json.push_back(json_numbert{std::to_string(lit)});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json_numbert takes a string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, takes a string but the output is formatted json number.

@@ -26,6 +26,8 @@ operator+=(const solver_hardnesst::sat_hardnesst &other)
clauses += other.clauses;
literals += other.literals;
variables.insert(other.variables.begin(), other.variables.end());
clause_set.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we sort this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about too many calls to sort().

@@ -91,10 +93,20 @@ void solver_hardnesst::register_clause(const bvt &bv)
{
current_hardness.clauses++;
current_hardness.literals += bv.size();
std::vector<int> clause;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we typedef this?

Comment on lines 144 to 145
with_solver_hardness([&bv, &solver_clause_num](solver_hardnesst &hardness) {
hardness.register_clause(bv, solver_clause_num);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the clause counter to hardness.register_clause(). Store the clause counter value rather than the clause themselves.

Comment on lines 104 to 113
#ifdef DEBUG
std::cout << solver_clause_num << ": ";
for(const auto &literal : bv)
std::cout << literal.dimacs() << " ";
std::cout << "0\n";
#endif
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug statements to output 'clause_counter_value: clause' to cross check if they match the corresponding lines in dimacs output.

@@ -140,8 +140,10 @@ void satcheck_minisat2_baset<T>::lcnf(const bvt &bv)

solver->addClause_(c);

with_solver_hardness(
[&bv](solver_hardnesst &hardness) { hardness.register_clause(bv); });
size_t solver_clause_num = clause_counter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assignment seems redundant, just use clause_counter in line 145.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clause_counter is an inherited data member of satcheck_minisat2_baset<T> from cnf_solvert so I cannot pass it directly to the lambda capture list (not in scope). Will need to pass this instead. Thought it would be better to just pass a copy.

@@ -50,6 +50,7 @@ struct solver_hardnesst
size_t clauses = 0;
size_t literals = 0;
std::unordered_set<size_t> variables = {};
std::vector<int> clause_set = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<int> clause_set = {};
std::vector<size_t> clause_set = {};

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

std::cout << "0\n";
#endif

current_hardness.clause_set.push_back(solver_clause_num + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to explain why +1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done quite a bit of performance optimisation work on CBMC. If you want a chat about it then drop me a line.

@@ -140,8 +140,22 @@ void satcheck_minisat2_baset<T>::lcnf(const bvt &bv)

solver->addClause_(c);

// To map clauses to lines of program code, track clause indices in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to be a downer on this but ...

  1. There is not necessarily a 1-to-1 correlations between clauses and lines of code. For example, rewriting and normalisation may mean that multiple lines reduce to the same shared term. Which line should this be credited to? This is worse when it is things like multiplication where additions from other operations can appear as sub-parts of the operator.

  2. Number of variables and clauses is a very crude measure of SAT difficulty. Reducing the number of clauses can make the problem harder; I have seen multiple instances of this.

void register_clause(const bvt &bv);
/// \param cnf: processed clause
/// \param cnf_clause_index: index of clause in dimacs output
/// \param register_cnf: negation of boolean variable tracking if the clause
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify this comment.

@@ -50,6 +50,7 @@ struct solver_hardnesst
size_t clauses = 0;
size_t literals = 0;
std::unordered_set<size_t> variables = {};
std::vector<size_t> clause_set = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the performance cost of this?

Natasha Yogananda Jeppu added 7 commits November 18, 2020 14:00
…just number of clauses

Currently, solver hardness tracks number of clauses and not the actual clauses for an
instruction. This mapping will help identfy code hotspots based on which of the clauses
belong to the SAT core.
This commit extends the sat_hardnesst structure to include mapping between clauses and
ssa expression.
…put for write-solver-stats-to

Clauses are stored as a sorted vector of integers.
…put for write-solver-stats-to

Clauses are stored as a sorted vector of integers.

clang format solver_hardness.cpp
…he clause itself

Add comment to explain solver_clause_num + 1 in solver hardness, change <int> to <size_t>
…he clause itself

Add comment to explain solver_clause_num + 1 in solver hardness, change <int> to <size_t>

Remove sorting of clause set keeping performance in mind
…he clause itself

Add comment to explain solver_clause_num + 1 in solver hardness, change <int> to <size_t>

Remove sorting of clause set keeping performance in mind

Add additional utility to map program lines to clauses in --dimacs output

clang format
@danielsn danielsn force-pushed the clause-map branch 2 times, most recently from 11000b3 to d9e87db Compare November 19, 2020 15:36
…he clause itself

Add comment to explain solver_clause_num + 1 in solver hardness, change <int> to <size_t>

Remove sorting of clause set keeping performance in mind

Add additional utility to map program lines to clauses in --dimacs output

clang format
@danielsn danielsn merged commit 69e0698 into diffblue:develop Nov 19, 2020
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-cmake-gcc
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-make-clang
GitHub action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-cmake-gcc
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-cmake-gcc
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 1, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 2, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 2, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 4, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 4, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 5, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 5, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 6, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 6, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 8, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 8, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 8, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 8, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 11, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 11, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 12, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 16, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 18, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 19, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 19, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jan 21, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

Also fix the status tracking in do_prop_solve as the previous
implementation would keep the solver in an UNSAT state, even when
assumptions had been modified. This issue was surfaced by the
cbmc-incr-oneloop suite of regression tests.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
jezhiggins pushed a commit to jezhiggins/cbmc that referenced this pull request Feb 3, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the Glucose interface
to match these changes.

To avoid future regressions, make the check-macos-10_15-cmake-clang
GitHub action use Glucose.
jezhiggins pushed a commit to jezhiggins/cbmc that referenced this pull request Feb 3, 2021
In diffblue#5480 the solver hardness interface was updated, but changes were
only implemented in the Minisat2 interface. Update the IPASIR interface
to match these changes.

Also fix the status tracking in do_prop_solve as the previous
implementation would keep the solver in an UNSAT state, even when
assumptions had been modified. This issue was surfaced by the
cbmc-incr-oneloop suite of regression tests.

To avoid future regressions, make the check-ubuntu-20_04-make-gcc GitHub
action use IPASIR with Riss as the back-end solver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants