Skip to content

Refactor and optimise size to have approximate bound #5914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/goto-symex/complexity_limiter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@ complexity_limitert::check_complexity(goto_symex_statet &state)
if(!complexity_limits_active() || !state.reachable)
return complexity_violationt::NONE;

std::size_t complexity = state.complexity();
if(complexity == 0)
std::size_t complexity = state.guard.as_expr().bounded_size(max_complexity);
if(complexity == 1)
return complexity_violationt::NONE;

auto &current_call_stack = state.call_stack();
Expand Down
9 changes: 0 additions & 9 deletions src/goto-symex/goto_state.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,6 @@ class goto_statet
/// Threads
unsigned atomic_section_id = 0;

/// Get the complexity for this state. Used to short-circuit states if they
/// have become too computationally complex.
inline std::size_t complexity()
{
// TODO: This isn't too efficent for BDDs, try using a different size
// like DAG size.
return guard.as_expr().size();
}

/// Constructors
goto_statet() = delete;
goto_statet &operator=(const goto_statet &other) = delete;
Expand Down
24 changes: 18 additions & 6 deletions src/util/expr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,28 @@ Author: Daniel Kroening, [email protected]

#include <stack>

std::size_t exprt::size() const
/// Returns the size of the exprt added to count without searching significantly
/// beyond the supplied limit.
std::size_t exprt::bounded_size(std::size_t count, std::size_t limit) const
{
// Initial size of 1 to count self.
std::size_t size = 1;
for(const auto &op : operands())
const auto &ops = operands();
count += ops.size();
for(const auto &op : ops)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of structural sharing this is (unnecessarily) exponential. Maybe use visit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't visit always do a full traversal, thus removing this optimisation of only doing a partial traversal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TGWDB Side note, please split this commit into the correctly titled optimisation and bug fix commits. I think the current title may already have mislead Martin. So lets get it sorted.

Copy link
Contributor Author

@TGWDB TGWDB Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-cs I'm afraid I don't fully understand your comment. In particular I don't understand the terminology "structural sharing" here. There are a few things that have come up as possible, and some comments addressing them below.

  1. There is a concern that ops is potentially copied. I've pushed a commit to make it clear that ops is never changed and so there should be no issues related to copying of ops.
  2. There is some question about the exponential nature of the function? The function is depth first and linear in the size of the exprt up to the bound. Note that the size() call is a constant for this datatype, and then the (depth first) walk of the exprt is linear.
  3. There may be a concern that the depth first walk could potentially overflow the stack with many function calls. This was also a (potential) problem in the old version. The version here should reduce this risk in two ways. Firstly, at each call the entire size of the operands is added, thus capturing more of the true size faster and so reducing function calls. Secondly, the bounded nature means this should (particularly when called on large structures) return without traversing the entire exprt.
  4. If the concern is writing a custom traversal rather than using a visitor then I guess this could be considered, assuming there is suitable motivation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TGWDB CPROVER shares irepts where possible. This can result in exponential compression. Consider the following expression:

let b = a + a
let c = b * b
let d = c + c
let e = d * d

CPROVER will store a total of 5 irepts. Four of them will have the same irept appearing twice in the sub.

A naive traversal of the form that you had written will visit a 16 times rather than just once. As the sharing gets larger so the time taken becomes exponential in the size of the data structure rather than linear. This happens. It is A Problem. In the best cases the sharing is linked enough that the pass grinds to a halt and it is obvious that the code needs to be fixed. Worse is when it is an issue but no-one notices the 8* slow down on the average case of one pass. Then at some point we feed in code and ... it just times out somewhere random and without careful and detailed profiling (which you almost certainly can't do with customer code) you will not find the problem.

I say this from bitter personal experience. @tautschnig and @peterschrammel have both done great work on making things sharing aware and sharing conscious.

So, if you want to traverse an expression for an unlimited amount PLEASE do not write for loops and recursion. Use a work queue and a set for what you have seen. Or use the visitor; that is is what it is there for.

Also, in the specific use-case you want, I think you probably want the number of unique exprts, you don't want to count duplicates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this explanation. The details of this particular sharing issue are useful. However I have just been looking at the implementation details of exprt::visit and I do not see how it could solve the particular issue you describe. The implementation for this function can be found in visit_pre_template. It uses an std::stack as its work queue (it also uses a for loop which you object to). std::stack is a simple LIFO data structure. If the same irept is pushed onto the stack twice, then it will be popped twice and visited twice. It does not offer the duplication removal of std::set or std::unordered_set. The use of the std::stack is still worthwhile here, because storing the working data set on the heap avoids storing it in the call stack as for a recursive implementation. Recursion could overflow the call stack when processing a sufficiently deeply nested data structure. Using the visitor is good, I just don't see how it could offer the particular benefit which you purport that it does. Was there a different visitor implementation which you were thinking of instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasspriggs apologies. There used to be a visitor that used a set to handle sharing. I would consider this a bug in exprt::visit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I just wanted to get to the bottom of what best practice would actually be.

Changing exprt::visit to not visit shared nodes like that could break existing usages of it. Say for example if it has been used for printing code, where we actually want to print the full tree. The exprt::unique_depth_(begin/end/cbegin/cend) iterators look to offer the functionality you were expecting. But they also seem to be unused.

{
size += op.size();
if(count >= limit)
{
return count;
}
count = op.bounded_size(count, limit);
}
return count;
}

return size;
/// Returns the size of the exprt without significantly searching beyond the
/// supplied limit.
std::size_t exprt::bounded_size(std::size_t limit) const
{
return bounded_size(1, limit);
}

/// Return whether the expression is a constant.
Expand Down
10 changes: 7 additions & 3 deletions src/util/expr.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@ class exprt:public irept
return static_cast<const typet &>(find(ID_type));
}

/// Amount of nodes this expression tree contains. This is the size of the
/// actual tree, ignoring memory/sub-tree sharing.
std::size_t size() const;
/// Amount of nodes in this expression tree approximately bounded by limit.
/// This is the size of the actual tree, ignoring memory/sub-tree sharing.
std::size_t bounded_size(std::size_t limit) const;

/// Return true if there is at least one operand.
bool has_operands() const
Expand Down Expand Up @@ -124,6 +124,10 @@ class exprt:public irept
const exprt &op3() const
{ return operands()[3]; }

/// Amount of nodes this expression tree contains, with a bound on how far
/// to search. Starts with an existing count.
std::size_t bounded_size(std::size_t count, std::size_t limit) const;

public:
void reserve_operands(operandst::size_type n)
{ operands().reserve(n) ; }
Expand Down
33 changes: 33 additions & 0 deletions unit/util/expr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,36 @@ SCENARIO("bitfield-expr-is-zero", "[core][util][expr]")
}
}
}

TEST_CASE("Bounded size is computer correctly", "[core][util][expr]")
{
// Helper types and functions for constructing test expressions.
const typet type = signedbv_typet(32);
std::function<exprt(size_t)> make_expression;
make_expression = [&](size_t size) -> exprt {
PRECONDITION(size != 0);
if(size <= 1)
return from_integer(1, type);
if(size == 2)
return unary_minus_exprt(from_integer(1, type));
return plus_exprt(from_integer(1, type), make_expression(size - 2));
};
// Actual test cases.
REQUIRE(make_expression(1).bounded_size(10) == 1);
REQUIRE(make_expression(2).bounded_size(10) == 2);
REQUIRE(make_expression(3).bounded_size(10) == 3);

REQUIRE(make_expression(30).bounded_size(10) < 13);
REQUIRE(make_expression(30).bounded_size(10) >= 10);
}

TEST_CASE("Bounded size versus pathological example", "[core][util][expr]")
{
const typet type = signedbv_typet(32);
exprt pathology = plus_exprt(from_integer(1, type), from_integer(1, type));
// Create an infinite exprt by self reference!
pathology.add_to_operands(pathology);
// If bounded size is not checking the bound this will never terminate.
REQUIRE(pathology.bounded_size(10) < 13);
REQUIRE(pathology.bounded_size(10) >= 10);
}