Skip to content

String solver: stop relying on array size #4626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 30, 2019

Conversation

allredj
Copy link
Contributor

@allredj allredj commented May 8, 2019

The string's backing array size (from its type) initially informs about the length of the string. However during refinement, the length can vary and if the length is tied to the size of the array, we need to rewrite the type as a conditional, which should be avoided (both the rewriting and the conditional).

This PR refactors string refinement so that during constraint generation, we refer to the size of the string solely through the array pool.
The size of a string thus always refers to a symbol, which may or may not be constrained.

The only time we get the array size is when inserting an array_string_exprt in the array pool (i.e. at the creation of the array_string_exprt).

There shouldn't be any visible change in JBMC, but that should fix some errors that have been happening with test-generation. I will keep looking to see if I can reproduce on JBMC.

I will also check that performance is not affected.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@allredj allredj added the Needs TG approval 🦉 Only merge with explicit approval from test-gen maintainers label May 8, 2019
@allredj allredj requested a review from romainbrenguier May 8, 2019 10:31
@allredj allredj self-assigned this May 8, 2019
Copy link
Contributor

@romainbrenguier romainbrenguier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to "Get rid of created_strings_value". No blocking request so far.

@@ -32,6 +32,18 @@ exprt array_poolt::get_length(const array_string_exprt &s) const
return s.length();
}

/// As opposed to get_length(), do not create a new symbol if the length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ \ref

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is part of the public interface so should be documented in the header file 🍇

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\ref is unnecessary. Doxygen understands where to link.

Copy link
Contributor Author

@allredj allredj May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc moved to header file

@@ -21,15 +21,14 @@ operator()(const irep_idt &prefix, const typet &type)
return result;
}

exprt array_poolt::get_length(const array_string_exprt &s) const
/// Get the length of an array_string_exprt from the array_pool. If the length
/// does not yet exist, create a new symbol and add it to the pool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍇

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc moved to header file

@@ -21,15 +21,14 @@ operator()(const irep_idt &prefix, const typet &type)
return result;
}

exprt array_poolt::get_length(const array_string_exprt &s) const
/// Get the length of an array_string_exprt from the array_pool. If the length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ typo in commit message: "accesssing"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -46,8 +46,7 @@ array_poolt::get_length_if_exists(const array_string_exprt &s) const
array_string_exprt
array_poolt::fresh_string(const typet &index_type, const typet &char_type)
{
symbol_exprt length = fresh_symbol("string_length", index_type);
array_typet array_type(char_type, length);
array_typet array_type(char_type, infinity_exprt(index_type));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ brace initialization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

exprt array_poolt::get_length(const array_string_exprt &s)
{
if(const auto &if_expr = expr_try_dynamic_cast<if_exprt>((exprt)s))
{
return if_exprt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ brace initialization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

length_of_array.emplace(array_expr, size_from_type);
INVARIANT(
emplace_result.second,
"attempt_assign_length should only be called when no entry for the"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempt_assign_length_from_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -65,6 +65,10 @@ array_poolt::fresh_string(const typet &index_type, const typet &char_type)
created_strings_value.insert(str);
arrays_of_pointers.emplace(
address_of_exprt(index_exprt(str, from_integer(0, index_type))), str);

// add length to length_of_array map
get_length(str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ maybe worth defining a void function to avoid having to write that comment twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in a later PR.

const std::set<array_string_exprt> &array_poolt::created_strings() const
/// Return a map mapping all array_string_exprt of the array_pool to their
/// length.
const std::unordered_map<array_string_exprt, exprt, irep_hash> &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 exposing the whole map here seems unnecessary since we are only using the first component. An alternative would be to define a for_each_string function with takes a function as argument and apply it to each array_string_exprt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it like this for now so that the PR doesn't blow up.

{
const exprt length = super_get(arr.length());
exprt length = super_get(length_from_pool.value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to remove the const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-introduced

@@ -105,7 +105,8 @@ static optionalt<exprt> get_array(
const std::function<exprt(const exprt &)> &super_get,
const namespacet &ns,
messaget::mstreamt &stream,
const array_string_exprt &arr);
const array_string_exprt &arr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ commit message doesn't say why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description

@@ -965,7 +965,11 @@ static optionalt<exprt> get_array(
const array_string_exprt &arr,
const array_poolt &array_pool)
{
const exprt &size = arr.length();
const auto &size_from_pool = array_pool.get_length_if_exists(arr);
exprt size = size_from_pool.has_value()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to remove the const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const re-introduced

@@ -493,8 +498,9 @@ std::pair<exprt, string_constraintst> add_axioms_for_format(
is_refined_string_type(arg.type()),
"arguments of format should be strings");
const array_string_exprt string_arg = get_string_expr(array_pool, arg);
get_arg = [string_arg](const irep_idt &id) {
return format_arg_from_string(string_arg, id);
/// REMOVE get_arg below
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgotten comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@allredj allredj force-pushed the string-stop-using-array-size branch from 1973e15 to f7f8ad2 Compare May 8, 2019 13:32
@smowton
Copy link
Contributor

smowton commented May 8, 2019

Will look at this tomorrow, but do you intend it to remove the illegal x ? y : z expressions where x has type char[x ? sizeof(y) : sizeof(z)]?

@allredj allredj force-pushed the string-stop-using-array-size branch from f7f8ad2 to 81edbb1 Compare May 9, 2019 08:39
@@ -32,6 +32,18 @@ exprt array_poolt::get_length(const array_string_exprt &s) const
return s.length();
}

/// As opposed to get_length(), do not create a new symbol if the length
/// of the array_string_exprt does not exist, but instead return an empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps clarify what you mean by a length not existing? Is that any non-constant length, or the length of a string we haven't fully initialised yet, or...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the documentation. This now states:

  /// As opposed to get_length(), do not create a new symbol if the length
  /// of the array_string_exprt does not have one in the array_pool, but instead
  /// return an empty optional.

return it->second;
}
return s.length();
auto emplace_result =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this allocates a symbol even if s already has a length, I'd suggest inserting a dummy and then replacing it with a fresh_symbol if appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fixup commit

std::unordered_map<array_string_exprt, exprt, irep_hash> &length_of_array,
symbol_generatort &symbol_generator)
{
if(const auto &if_expr = expr_try_dynamic_cast<if_exprt>((exprt)array_expr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the explicit cast (and to a non-reference-type?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the (exprt)? Curiously exprt_try_dynamic_cast doesn't work directly on array_string_exprt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, this is obvious -- an array_string_exprt is certainly not an if_exprt, this can't ever be run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one other case with the same cast and the same unreachable code. This suggests writing tests to exercise this code!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's unreachable. array_string_exprts can have ID_if AFAIK. However you make a good point with the test coverage, as we don't seem to have anything to cover these lines so for now I might just put an assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this place i replaced the if with an assertion because I can't find any test to cover this part. However in the other place where I use the condition, I left it in because it's often the case that the array_string_exprt is an if. Therefore that code is well covered by regression tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, my mistake, I thought array_string_exprt constrained the expression to have a particular id, and therefore not be an if_exprt, but actually it just constrains the type of a general expression. The implementation of the expr-casting stuff, though, assumes this sort of trait class doesn't exist, and so casts are only possible to children of the current type. That ought to be fixed, but is obviously out of scope for this PR, so approving.

}
else
{
length_of_array.emplace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check this one is new, too? Therefore, deduplicate the emplace calls and the invariant below this if block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fixup commit

@@ -65,6 +65,10 @@ array_poolt::fresh_string(const typet &index_type, const typet &char_type)
created_strings_value.insert(str);
arrays_of_pointers.emplace(
address_of_exprt(index_exprt(str, from_integer(0, index_type))), str);

// add length to length_of_array map
get_length(str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't normally expect a get_ function to have a side-effect like this, suggest renaming

Copy link
Contributor Author

@allredj allredj May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be happy with get_or_create_length?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

@@ -76,6 +76,8 @@ class array_poolt final

private:
/// Associate arrays to char pointers
/// Invariant: Each array_string_exprt in this map has a corresponding entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this enforced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved documentation. Now states:

  /// Associate arrays to char pointers
  /// Invariant: Each array_string_exprt in this map has a corresponding entry
  ///            in length_of_array.
  ///            This is enforced whenever we add an element to
  ///            `arrays_of_pointers`, i.e. fresh_string()
  ///            and make_char_array_for_char_pointer().

@allredj allredj force-pushed the string-stop-using-array-size branch 7 times, most recently from 3723a52 to 9fb4c37 Compare May 9, 2019 12:37
@allredj
Copy link
Contributor Author

allredj commented May 9, 2019

@smowton About your general comment. Yes, removing the conditional types is the idea of that PR.

@allredj allredj force-pushed the string-stop-using-array-size branch from 9fb4c37 to bde860c Compare May 9, 2019 13:04
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see those conditionals in types go away! I'd just ask for the commit history to be cleaned up with all those fixup commits. Doing so will then also re-trigger the codebuild task.

@allredj
Copy link
Contributor Author

allredj commented May 13, 2019

@tautschnig Thanks. Of course I'll merge the fixup commits before merge. I have some failing tests on TG, so I need to fix that first anyway.

@allredj allredj force-pushed the string-stop-using-array-size branch from bde860c to 6071d79 Compare May 13, 2019 17:29
Joel Allred added 26 commits May 30, 2019 09:16
The length of the backing array of strings should not be read, so no
need to create a length symbol for the type when creating a new string.
Lengths of array_string_exprt can be any exprt, not just symbol_exprt.
In particular, we want to be able to give the length coming from the
backing array, which is not necessarily a symbol.
All created strings should have an entry in arrays_of_pointers.
Function to add the length of the string backing array to the
length_of_array map in the array_pool. If the size does not exist,
create a symbol.

Function will be used to assign an initial length to the
array_string_exprt, using the type information.

After that initial assignment, there will be no further reference to the
length of the underlying array.
Remove handling of if_case, as I cannot find a test case that exercises
it.
This invariant ensures that we get all the strings of the array_pool
when iterating over length_of_array, allowing to get rid of the
created_strings_value map.
Using the newly adding invariants, created_strings() now returns the
length_of_array map, which contains all array_string_exprts of the
array_pool.

This map is more complete than created_strings_value.
No longer used.

array_poolt.find() can be made to return the array_string_exprt directly
In string_refinement::get(), we now get the length of the string from
the array_pool instead of the array type.

We use get_length_if_exists() instead of get_length(), because we want to
ignore if the length does not exist, instead of creating a new
symbol.
String length will be queried from array_pool in future commit.
Avoid reading from array size.
Argument will be required to query string lengths.
Replace calls to str.length() by calls to array_pool.get_length(str).
We require a constant value in this case and querying the array pool
gives us a symbol.
Length is now always read from array_pool
Simplifies the interface by passing that string argument directly rather
than the function that retrieves it.
size exprt is not meaningful in all cases.
This is necessary for upcoming refactoring.
If the size of the array is not provided by the model, get the size
directly from the array.

Note that the non-constant case now returns an empty optional rather
than an unknown exprt.
Checks that the string literal is present in the trace even if it is not
part of the decision. As a side-effect, this checks that the length is
properly assigned to the string, even if it cannot be retrieved from the
solver.
@allredj allredj force-pushed the string-stop-using-array-size branch from d622019 to cd61e81 Compare May 30, 2019 08:16
Copy link
Contributor Author

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: cd61e81).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/113725722

@allredj allredj merged commit 53b5c0b into diffblue:develop May 30, 2019
@allredj allredj deleted the string-stop-using-array-size branch May 30, 2019 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs TG approval 🦉 Only merge with explicit approval from test-gen maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants