Skip to content

String solver: stop relying on array size #4626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
78d92f8
Add array_pool as a field of string_builtin_functionst
May 14, 2019
eb44428
get_length_if_exists() function
May 14, 2019
504d6ea
Don't set the array type's length
May 14, 2019
82e6d68
array_poolt.get_length() returns a new symbol by default
May 14, 2019
fddf30a
fresh_string: do not create a length symbol
May 14, 2019
eab5e53
Open length_of_array to non-symbol lengths
May 14, 2019
91598f6
Make array_poolt.get_length handle if_exprts
May 14, 2019
07f622c
Add missing entry to arrays_of_pointers map
May 14, 2019
fb8c1d1
Add attempt_assign_length_from_type
May 14, 2019
e17c2e7
fixup! Add attempt_assign_length_from_type
May 14, 2019
1645ffa
Use attempt_assign_length_from_type() in array_poolt.insert()
May 14, 2019
b630e78
Make arg of array_pool.insert() const
May 14, 2019
1f869ef
Document and enforce invariant in array_pool
May 14, 2019
2e6c4f1
array_pool.created_strings() now uses length_of_array
May 14, 2019
d87ea4e
Get rid of created_strings_value
May 14, 2019
cb0cee4
string_refinement::get(): get length if exists
May 14, 2019
7ff979e
Pass array_pool arg to get_array
May 14, 2019
3a6a8e9
Get length from array_pool in get_array()
May 14, 2019
981f2d2
Propagate array_pool to constraint generation
May 14, 2019
ab66476
Read string length from array pool
May 14, 2019
0b72c16
Unit test: get constant length directly from array
May 14, 2019
4dbadb0
Remove unused array_string_exprt.length()
May 14, 2019
f6b324f
Refactor format support to not use get_arg function argument
May 14, 2019
5035425
Format: Remove unnecessary conversion
May 14, 2019
1db3059
Rename get_length to get_or_create_length
May 14, 2019
58bf455
get_array: remove useless variables
May 14, 2019
b1c9eb1
Renmove ref to size in warning message
May 14, 2019
536253b
Refactor get_array: extract get_valid_array_size
May 14, 2019
7678ef1
Get size from array if not available in model
May 14, 2019
cd61e81
Test for concretized string literal in trace
May 14, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
12 changes: 12 additions & 0 deletions jbmc/regression/jbmc-strings/literal_length/test.desc
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
CORE
test.class
--function test.main --trace
^EXIT=10$
^SIGNAL=0$
^\[.*assertion.1\].* line 6.* FAILURE$
abc_constarray=\{ 'a', 'b', 'c' \}
--
--
Check that the string literal used in the constructor is properly concretized
in the trace even if it is not part of the equation.
This is necessary for test generation.
8 changes: 8 additions & 0 deletions jbmc/regression/jbmc-strings/literal_length/test.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
public class test
{
public static void main()
{
String x = new String("abc");
assert false;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -88,18 +88,25 @@ exprt get_data_pointer(const array_string_exprt &arr)

/// Creates a `string_exprt` of the proper string type.
/// \param [in] str: string to convert
/// \param array_pool: pool of arrays representing strings
/// \return corresponding `string_exprt`
refined_string_exprt make_refined_string_exprt(const array_string_exprt &arr)
refined_string_exprt make_refined_string_exprt(
const array_string_exprt &arr,
array_poolt &array_pool)
{
return refined_string_exprt(arr.length(), get_data_pointer(arr));
return refined_string_exprt(
array_pool.get_or_create_length(arr), get_data_pointer(arr));
}

/// For a constant `string_exprt`, creates a full index set.
/// \param [in] s: `string_exprt` to create index set for
/// \param array_pool: pool of arrays representing strings
/// \return the corresponding index set
std::set<exprt> full_index_set(const array_string_exprt &s)
std::set<exprt>
full_index_set(const array_string_exprt &s, array_poolt &array_pool)
{
const mp_integer n = numeric_cast_v<mp_integer>(to_constant_expr(s.length()));
const mp_integer n = numeric_cast_v<mp_integer>(
to_constant_expr(to_array_type(s.type()).size()));
std::set<exprt> ret;
for(mp_integer i = 0; i < n; ++i)
ret.insert(from_integer(i));
Expand Down Expand Up @@ -179,17 +186,20 @@ SCENARIO(
// initialize architecture with sensible default values
config.set_arch("none");

symbol_generatort symbol_generator;
array_poolt array_pool(symbol_generator);

// Creating strings
const auto ab_array = make_string_exprt("ab");
const auto b_array = make_string_exprt("b");
const auto a_array = make_string_exprt("a");
const auto empty_array = make_string_exprt("");
const auto cd_array = make_string_exprt("cd");
const auto ab = make_refined_string_exprt(ab_array);
const auto b = make_refined_string_exprt(b_array);
const auto a = make_refined_string_exprt(a_array);
const auto empty = make_refined_string_exprt(empty_array);
const auto cd = make_refined_string_exprt(cd_array);
const auto ab = make_refined_string_exprt(ab_array, array_pool);
const auto b = make_refined_string_exprt(b_array, array_pool);
const auto a = make_refined_string_exprt(a_array, array_pool);
const auto empty = make_refined_string_exprt(empty_array, array_pool);
const auto cd = make_refined_string_exprt(cd_array, array_pool);

GIVEN("The not_contains axioms of String.lastIndexOf(String, Int)")
{
Expand Down Expand Up @@ -247,8 +257,8 @@ SCENARIO(
WHEN("we instantiate and simplify")
{
// Making index sets
const std::set<exprt> index_set_ab = full_index_set(ab_array);
const std::set<exprt> index_set_b = full_index_set(b_array);
const std::set<exprt> index_set_ab = full_index_set(ab_array, array_pool);
const std::set<exprt> index_set_b = full_index_set(b_array, array_pool);

// List of new lemmas to be returned
std::vector<exprt> lemmas;
Expand Down Expand Up @@ -305,7 +315,7 @@ SCENARIO(
WHEN("we instantiate and simplify")
{
// Making index sets
const std::set<exprt> index_set_a = full_index_set(a_array);
const std::set<exprt> index_set_a = full_index_set(a_array, array_pool);

// Instantiate the lemmas
std::vector<exprt> lemmas = instantiate_not_contains(
Expand Down Expand Up @@ -355,8 +365,8 @@ SCENARIO(
WHEN("we instantiate and simplify")
{
// Making index sets
const std::set<exprt> index_set_a = full_index_set(a_array);
const std::set<exprt> index_set_b = full_index_set(b_array);
const std::set<exprt> index_set_a = full_index_set(a_array, array_pool);
const std::set<exprt> index_set_b = full_index_set(b_array, array_pool);

// Instantiate the lemmas
std::vector<exprt> lemmas = instantiate_not_contains(
Expand Down Expand Up @@ -406,7 +416,7 @@ SCENARIO(
WHEN("we instantiate and simplify")
{
// Making index sets
const std::set<exprt> index_set_a = full_index_set(a_array);
const std::set<exprt> index_set_a = full_index_set(a_array, array_pool);
const std::set<exprt> index_set_empty = {
generator.fresh_symbol("z", t.length_type())};

Expand Down Expand Up @@ -459,7 +469,7 @@ SCENARIO(
WHEN("we instantiate and simplify")
{
// Making index sets
const std::set<exprt> index_set_ab = full_index_set(ab_array);
const std::set<exprt> index_set_ab = full_index_set(ab_array, array_pool);

// Instantiate the lemmas
std::vector<exprt> lemmas = instantiate_not_contains(
Expand Down Expand Up @@ -510,8 +520,8 @@ SCENARIO(
WHEN("we instantiate and simplify")
{
// Making index sets
const std::set<exprt> index_set_ab = full_index_set(ab_array);
const std::set<exprt> index_set_cd = full_index_set(cd_array);
const std::set<exprt> index_set_ab = full_index_set(ab_array, array_pool);
const std::set<exprt> index_set_cd = full_index_set(cd_array, array_pool);

// Instantiate the lemmas
std::vector<exprt> lemmas = instantiate_not_contains(
Expand Down
104 changes: 80 additions & 24 deletions src/solvers/strings/array_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,48 @@ operator()(const irep_idt &prefix, const typet &type)
return result;
}

exprt array_poolt::get_length(const array_string_exprt &s) const
exprt array_poolt::get_or_create_length(const array_string_exprt &s)
{
if(s.length() == infinity_exprt(s.length().type()))
if(const auto &if_expr = expr_try_dynamic_cast<if_exprt>((exprt)s))
{
auto it = length_of_array.find(s);
if(it != length_of_array.end())
return it->second;
return if_exprt{
if_expr->cond(),
get_or_create_length(to_array_string_expr(if_expr->true_case())),
get_or_create_length(to_array_string_expr(if_expr->false_case()))};
}
return s.length();

auto emplace_result =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this allocates a symbol even if s already has a length, I'd suggest inserting a dummy and then replacing it with a fresh_symbol if appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fixup commit

length_of_array.emplace(s, symbol_exprt(s.length_type()));
if(emplace_result.second)
{
emplace_result.first->second =
fresh_symbol("string_length", s.length_type());
}

return emplace_result.first->second;
}

optionalt<exprt>
array_poolt::get_length_if_exists(const array_string_exprt &s) const
{
auto find_result = length_of_array.find(s);
if(find_result != length_of_array.end())
return find_result->second;
return {};
}

array_string_exprt
array_poolt::fresh_string(const typet &index_type, const typet &char_type)
{
symbol_exprt length = fresh_symbol("string_length", index_type);
array_typet array_type(char_type, length);
array_typet array_type{char_type, infinity_exprt(index_type)};
symbol_exprt content = fresh_symbol("string_content", array_type);
array_string_exprt str = to_array_string_expr(content);
created_strings_value.insert(str);
arrays_of_pointers.emplace(
address_of_exprt(index_exprt(str, from_integer(0, index_type))), str);

// add length to length_of_array map
get_or_create_length(str);

return str;
}

Expand Down Expand Up @@ -87,40 +110,73 @@ array_string_exprt array_poolt::make_char_array_for_char_pointer(
to_array_string_expr(fresh_symbol(symbol_name, char_array_type));
const auto insert_result =
arrays_of_pointers.insert({char_pointer, array_sym});

// add length to length_of_array map
get_or_create_length(array_sym);

return to_array_string_expr(insert_result.first->second);
}

/// Given an array_string_exprt, get the size of the underlying array. If that
/// size is undefined, create a new symbol for the size.
/// Then add an entry from `array_expr` to that size in the `length_of_array`
/// map.
///
/// This function should only be used at the creation of the
/// `array_string_exprt`s, as it is the only place where we can reliably refer
/// to the size in the type of the array.
static void attempt_assign_length_from_type(
const array_string_exprt &array_expr,
std::unordered_map<array_string_exprt, exprt, irep_hash> &length_of_array,
symbol_generatort &symbol_generator)
{
DATA_INVARIANT(
array_expr.id() != ID_if,
"attempt_assign_length_from_type does not handle if exprts");
// This invariant seems always true, but I don't know why.
// If we find a case where this is violated, try calling
// attempt_assign_length_from_type on the true and false cases.
const exprt &size_from_type = to_array_type(array_expr.type()).size();
const exprt &size_to_assign =
size_from_type != infinity_exprt(size_from_type.type())
? size_from_type
: symbol_generator("string_length", array_expr.length_type());

const auto emplace_result =
length_of_array.emplace(array_expr, size_to_assign);
INVARIANT(
emplace_result.second,
"attempt_assign_length_from_type should only be called when no entry"
"for the array_string_exprt exists in the length_of_array map");
}

void array_poolt::insert(
const exprt &pointer_expr,
array_string_exprt &array_expr)
const array_string_exprt &array_expr)
{
const exprt &length = array_expr.length();
if(length == infinity_exprt(length.type()))
{
auto pair = length_of_array.insert(
std::make_pair(array_expr, fresh_symbol("string_length", length.type())));
array_expr.length() = pair.first->second;
}

const auto it_bool =
arrays_of_pointers.insert(std::make_pair(pointer_expr, array_expr));
created_strings_value.insert(array_expr);

INVARIANT(
it_bool.second, "should not associate two arrays to the same pointer");

attempt_assign_length_from_type(array_expr, length_of_array, fresh_symbol);
}

const std::set<array_string_exprt> &array_poolt::created_strings() const
/// Return a map mapping all array_string_exprt of the array_pool to their
/// length.
const std::unordered_map<array_string_exprt, exprt, irep_hash> &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 exposing the whole map here seems unnecessary since we are only using the first component. An alternative would be to define a for_each_string function with takes a function as argument and apply it to each array_string_exprt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it like this for now so that the PR doesn't blow up.

array_poolt::created_strings() const
{
return created_strings_value;
return length_of_array;
}

const array_string_exprt &
array_poolt::find(const exprt &pointer, const exprt &length)
array_string_exprt array_poolt::find(const exprt &pointer, const exprt &length)
{
const array_typet array_type(pointer.type().subtype(), length);
const array_string_exprt array =
make_char_array_for_char_pointer(pointer, array_type);
return *created_strings_value.insert(array).first;
return array;
}

array_string_exprt of_argument(array_poolt &array_pool, const exprt &arg)
Expand Down
35 changes: 25 additions & 10 deletions src/solvers/strings/array_pool.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,17 +53,31 @@ class array_poolt final
return arrays_of_pointers;
}

/// Associate an actual finite length to infinite arrays
/// Get the length of an array_string_exprt from the array_pool. If the length
/// does not yet exist, create a new symbol and add it to the pool.
///
/// If the array_string_exprt is an `if` expression, the true and false parts
/// are handled independently (and recursively). That is, no new length symbol
/// is added to the pool for `if` expressions, but symbols may be added for
/// each of the parts.
/// \param s: array expression representing a string
/// \return expression for the length of `s`
exprt get_length(const array_string_exprt &s) const;
exprt get_or_create_length(const array_string_exprt &s);

void insert(const exprt &pointer_expr, array_string_exprt &array);
/// As opposed to get_length(), do not create a new symbol if the length
/// of the array_string_exprt does not have one in the array_pool, but instead
/// return an empty optional.
/// \param s: array expression representing a string
/// \return expression for the length of `s`, or empty optional
optionalt<exprt> get_length_if_exists(const array_string_exprt &s) const;

void insert(const exprt &pointer_expr, const array_string_exprt &array);

/// Creates a new array if the pointer is not pointing to an array
const array_string_exprt &find(const exprt &pointer, const exprt &length);
array_string_exprt find(const exprt &pointer, const exprt &length);

const std::set<array_string_exprt> &created_strings() const;
const std::unordered_map<array_string_exprt, exprt, irep_hash> &
created_strings() const;

/// Construct a string expression whose length and content are new variables
/// \param index_type: type used to index characters of the strings
Expand All @@ -74,18 +88,19 @@ class array_poolt final

private:
/// Associate arrays to char pointers
/// Invariant: Each array_string_exprt in this map has a corresponding entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this enforced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved documentation. Now states:

  /// Associate arrays to char pointers
  /// Invariant: Each array_string_exprt in this map has a corresponding entry
  ///            in length_of_array.
  ///            This is enforced whenever we add an element to
  ///            `arrays_of_pointers`, i.e. fresh_string()
  ///            and make_char_array_for_char_pointer().

/// in length_of_array.
/// This is enforced whenever we add an element to
/// `arrays_of_pointers`, i.e. fresh_string()
/// and make_char_array_for_char_pointer().
std::unordered_map<exprt, array_string_exprt, irep_hash> arrays_of_pointers;

/// Associate length to arrays of infinite size
std::unordered_map<array_string_exprt, symbol_exprt, irep_hash>
length_of_array;
std::unordered_map<array_string_exprt, exprt, irep_hash> length_of_array;

/// Generate fresh symbols
symbol_generatort &fresh_symbol;

/// Strings created in the pool
std::set<array_string_exprt> created_strings_value;

array_string_exprt make_char_array_for_char_pointer(
const exprt &char_pointer,
const typet &char_array_type);
Expand Down
Loading