Skip to content

TG-58 Complete rework of string solver to avoid using infinite arrays #1420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

romainbrenguier
Copy link
Contributor

This makes the string solver algorithm work only on finite size arrays. The input program may have pointers and infinite arrays in which case the solver associate finite arrays to them.
The type for the String class in java-bytecode is changed to contain a char pointer instead of a pointer to an infinite array, however infinite array are still present there for non-deterministic assignment.

@romainbrenguier romainbrenguier force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch 2 times, most recently from 7148616 to 59fc1bd Compare September 25, 2017 08:02
@romainbrenguier romainbrenguier force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch 4 times, most recently from 6371f41 to aca00ce Compare September 25, 2017 10:29
@romainbrenguier romainbrenguier force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch from 330d8da to 4e6fd78 Compare September 26, 2017 09:43
@romainbrenguier romainbrenguier changed the title Complete rework of string solver to avoid using infinite arrays TG-58 Complete rework of string solver to avoid using infinite arrays Sep 26, 2017
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed about a quarter of this. It's just too big. Don't have time to leaf through the rest; suggest finding another reviewer to pick up where I left off, or find a way to split the huge 3000-line commit in the middle of this into stages so each can be reviewed individually.

{
StringBuilder sb=new StringBuilder("");
for(int i=0; i<arr.length; i++)
sb.append(arr[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indentation in this function

char arr[]=new char[s.length()];
// We limit arbitrarly the loop unfolding to 5
for(int i=0; i<length && i<5; i++)
arr[i]=s.charAt(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

@@ -9,8 +9,10 @@ public static String main()
{
s.append((String) t[0]);
}
assert(s.toString().equals("Hello world!"));
assert(!s.toString().equals("Hello world!"));
if(i==0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indents

String s = sb.toString();
System.out.println(s);
assert(s.equals("abcd"));
if(i==0)
assert(s.equals("abcd"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indents

// We limit arbitrarly the loop unfolding to 10
for(int i=0; i<length && i<10; i++)
arr[i]=s.charAt(i);
return arr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

operands().resize(2);
}

refined_string_exprt(const exprt &_length, const exprt &_content, typet type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const typet &?

@@ -234,13 +234,13 @@ exprt string_constraint_generatort::associate_array_to_pointer(
const auto &length=array_expr.length();
if(length==infinity_exprt(length.type()))
{
auto ret=length_of_array_.insert(
length_of_array_.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're expecting this to be a fresh insert, check insert's return value

std::make_pair(array_expr, fresh_symbol("string_length", length.type())));
array_expr.length()=length_of_array_[array_expr];
}

// TODO should use a function for that
auto ret=arrays_of_pointers_.insert(std::make_pair(pointer_expr, array_expr));
arrays_of_pointers_.insert(std::make_pair(pointer_expr, array_expr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, check insert's return value

@@ -545,11 +545,12 @@ codet initialize_nondet_string_struct(
code_blockt code;

// `obj` is `*expr`
// const dereference_exprt obj(expr, expr.type().subtype());
const irep_idt &class_id=
const struct_typet struct_type=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always follow; take return by const ref

@@ -33,6 +33,20 @@ TEST_CASE("Generate string object")
bool failed=symbol_table.add(jlo_sym);
CHECK_RETURN(!failed);


// Add java.lang.Object
symbolt new_symbol;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to do this -- jlo is always in the class loader's queue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we have a class loader here (did you see we are in a unit test?)

@romainbrenguier romainbrenguier force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch 4 times, most recently from e63e2e9 to 2bf15cd Compare September 27, 2017 09:38
@LAJW LAJW force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch 2 times, most recently from 1873cf3 to e34064f Compare September 27, 2017 12:21
Copy link
Contributor

@jasigal jasigal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, I need to better understand more of the string refinement changes to comment on structure.

@@ -26,10 +26,34 @@ Author: Alberto Griggio, [email protected]
#include <solvers/sat/satcheck.h>
#include <solvers/refinement/string_constraint_instantiation.h>
#include <java_bytecode/java_types.h>
#include <util/optional.h>

#include <util/ssa_expr.h> // for char array pointer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these appear to be unused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, there clearly is a usage of optional in the file for example. Question is whether we should start removing headers if they were already included in parent file. Or make some sort of stdafx.h precompiled header file, which would include most commonly used headers (STL utilities, containers, strings, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went to go look in CODING_STANDARD.md. It's not completely clear on this type of case, but it does say avoid unnecessary #includes, which they are in the literal sense.

exprt union_find_replacet::add_symbol(const exprt &lhs, const exprt &rhs)
{
const exprt lhs_root=find_expr(lhs);
const exprt rhs_root=find_expr(rhs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const & for both

return rhs_root;
}

static std::vector<exprt> instantiate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be deleted, other declaration below.

@@ -87,13 +111,18 @@ static exprt instantiate(
const exprt &val);

static std::vector<exprt> instantiate(
messaget::mstreamt &stream,
const namespacet &ns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can delete these added args, they are unneeded in the function.

@@ -249,327 +280,315 @@ static std::vector<exprt> generate_instantiations(
for(const auto &nc_axiom : axioms.not_contains)
{
for(const auto &instance :
instantiate(nc_axiom, index_set, generator))
instantiate(stream, ns, nc_axiom, index_set, generator))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can delete first two args.

const exprt &i=cur.op1();

bool has_quant_var=find_qvar(i, qvar);
const index_exprt index_expr=to_index_expr(cur);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be &.

@@ -1712,25 +1802,34 @@ static void initial_index_set(
to_process.pop_back();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop could use expression iterator.

/// \return the lemmas produced through instantiation
static std::vector<exprt> instantiate(
messaget::mstreamt &stream,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unneeded.

Copy link
Contributor

@LAJW LAJW Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite the opposite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It wasn't needed.)

@@ -1874,15 +1973,22 @@ static exprt instantiate(

/// Instantiates a quantified formula representing `not_contains` by
/// substituting the quantifiers and generating axioms.
/// \param stream: a message stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this by removing unused arguments and updating index_set (and removing current_index set).

it->second);
}
else if(ecopy.id()==ID_struct)
if(ecopy.type().id()==ID_array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be all arrays? It used to just be character arrays. For instance, the skolem functions in not contains constraints are arrays. This code seems to assume the arrays are strings.

@LAJW LAJW force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch 3 times, most recently from a7df645 to d8d91e9 Compare September 27, 2017 13:09
@jasigal
Copy link
Contributor

jasigal commented Sep 27, 2017

One thing I think we should address with such a big change to the string refinement process is stop the solver from treating skolem functions as strings. If I run the solver on cbmc/regression/strings-smoke-test/java_contains (so skolem functions from not contains axioms exist), it has its own index set. I have a comment above where any array seems to be treated as a string.

@LAJW LAJW force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch 4 times, most recently from 01e6dd2 to dad57d7 Compare September 27, 2017 14:39
@romainbrenguier
Copy link
Contributor Author

@smowton

or find a way to split the huge 3000-line commit in the middle of this into stages so each can be reviewed individually.

The problem is that it's very difficult to find any significant intermediary stage in this big commit that wouldn't break all the string tests.

}
error() << "dec_solve: current index set is empty, "
<< "this should not happen" << eom;
return resultt::D_ERROR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romainbrenguier Changed the behavior of the solver here, probably not on purpose. In a later commit to revert to what was originally there for the most part, but not completely.

const exprt arr_model=simplify_expr(*arr_model_opt, ns);
const exprt concretized_array=concretize_arrays_in_expression(
arr_model, generator.max_string_length, ns);
debug() << "get " << from_expr(ns, "", expr) << " --> "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debugging something with Justing, and this debug output and the one below make this impossible, get is called too often. Please delete.

@romainbrenguier romainbrenguier force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch from 2fa3500 to ef9aeaf Compare September 28, 2017 17:36
Remove duplicate code for add_axioms_for_insert_float.
Style: indentation in add_axioms_for_insert.
Remove useless comment.
Remove unused function: associate_char_array_to_char_pointers.
Rename solver in symbol_resolve.
Rename function in generate_symbol_resolution_from_eq.
Rename function get_char_array_and_concretize.

Documentation update for generate_symbol_resolution, add_lemma,
substitute_function_application, is_char_type.

Some documentation and style fixes.
Debug information for arrays of pointers

Debug information for check_axioms steps

Refactoring: put debug information out of check_axioms

Replace warning by invariant
This is necessary for the negation procedure to know the existential
bounds and unfold the quantification as a conjunction.
This is in the case where super_get does not anything about the array we
give it.
This is necessary so that the string solver returns an array of the
correct size, if the size is the only thing we know about the string.

Add warning message in the case where the symbol is unknown to super_get
This is done by looking at which arrays were associted to pointers in
the generator.
We use this in get to distinguish between strings and arrays of characters
This removes unused functions, declaration of unimplemented function,
unused constants and unused fields.
In initialisation of string objects, this prevents the created objects
from getting out of scope.
The function what was there before was not correct.
It was calling cprover_string_copy which is deprecated.
The check for a specific object number was removed so this does not
check what we want anyway.
It is difficult to make it check the original intended behaviour using
only regular expressions.
This more accurately describe what this object is.
@romainbrenguier romainbrenguier force-pushed the bugfix/char-array-in-java-strings#TG-58#rebase branch from 27c1f83 to 3ee3b25 Compare October 28, 2017 17:44
@romainbrenguier
Copy link
Contributor Author

@peterschrammel I squashed a lot of the correction commits with the corresponding original commit, I don't think there is much more to be done. For the cpp linter failure, this is due to a disagreement with what clang-format does for brackets in lamdbas.

@romainbrenguier
Copy link
Contributor Author

@peterschrammel

How come? The function seems misplaced then.

Would java_utils.cpp be appropriate for this?

Add nolint marker on line with lambdas, which are formatted by clang
in a way that is not compatible with cpp-lint.
@romainbrenguier
Copy link
Contributor Author

This should be replaced by #1539

@peterschrammel peterschrammel merged commit aa94fe8 into diffblue:develop Oct 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants