Skip to content

Improvements in string preprocessing to facilitate constant propagation #984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

romainbrenguier
Copy link
Contributor

@romainbrenguier romainbrenguier commented Jun 1, 2017

This make string assignments take place in one step instead of field by field which makes constant propagation easier.
This also adds concatenation with start_index and end_index arguments in the solver.
This depends on pull requests #983 #980 #978 #906
and is part of the implementation of https://github.com/diffblue/test-gen/issues/256

romainbrenguier and others added 18 commits June 1, 2017 10:27
This is a new implementation of the string preprocessing for Java
functions. Which now takes place during java bytecode processing
instead of being a step in the goto-program treatment.

Rename code_of_function to code_for_function

Update documentation

Minor corrections on the code

Make all methods non-static

This is to allow using class variables in them.
The conversion_functiont map had to be adapted.

Use class variable for refined string type

Instead of recreating a new refined string type each time it is
requested, we add a field to hold it in java_string_library_preprocess.

Squash in Replacement

Replace code.copy_to_operands() with code.add()

Correct implementation and usage of string_length_type()
…onversion

Removing add_string_type declaration, not used anymore
Removing string_refine_preprocess from Makefile

Removing include of string_refine_preprocess
Only one instance of this class should exist for a program.
We now do this at the end of the bytecode conversion, in function
java_bytecode_languaget::final instead of the middle of the
conversion.
String classes that are present in a loaded model will generate a type
that may be different from the internal type constructed for Java
strings in cbmc. Thus we now create a String type and write it in the
table, possibly overwriting a pre-existing Java string type.
…e/test-gen#159

We now have a special way of dealing with contains in the case the
argument is a constant, which seems to be much more efficient.
Adding constraints to make sure the index_of primitive of the string
solver is precise.
Also add specific constraints for constant argument in indexOf to
improve performances in that case.
We now assign strings in one step instead of field by field.
This makes constant propagation work better in later step of the
analysis.
This adds a substring of the second argument.
This will make the encoding of StringBuilder.append(CharSequence s,
int start, int end) more efficient.
@forejtv forejtv requested review from smowton and cristina-david June 5, 2017 15:08
@smowton
Copy link
Contributor

smowton commented Jun 6, 2017

Ping me when deps merged?

@smowton
Copy link
Contributor

smowton commented Jun 7, 2017

Needs a rebase

@romainbrenguier
Copy link
Contributor Author

Actually I think we will split this pull request into several smaller ones. Once this is done we will close this one.

Copy link
Collaborator

@cristina-david cristina-david left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty long so I was wondering if there is any way it can be split? For instance, the axioms for constant strings in string_constraint_generator_main.cpp appear in a different PR #828. Should this be rebased?


// Replacing call if it is a function of the Character library,
// returning the same call otherwise
c=string_preprocess.replace_character_call(call);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this here and do all the instrumentation in java_bytecode_language.cpp in a separate pass?

{
symbolt sym=symbol_table.lookup("java::java.lang.String");
typet concrete_type=sym.type;
// TODO: Check that this is indeed the 'length' field.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be implemented?


// TODO: Manage the case where we have a non-String Object (this should
// probably be handled upstream. At any rate, the following code should be
// protected with assertions on the type of op1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this comment be implemented?


\*******************************************************************/

typet java_string_library_preprocesst::get_length_type(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this method and the above one java_string_library_preprocesst::get_data_type be combined into one by adding an extra argument? They seem almost identical otherwise.

std::string fun_name_data=id2string(function_name)+"_data";

// Assignments
codet assign_fun_length=code_assign_function_application(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const codet &?

code_blockt code;

// A String has a field Object with @clsid = String and @lock = false:
symbolt jlo_symbol=symbol_table.lookup("java::java.lang.Object");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const symbolt &jlo_symbol?

dereference_exprt(new_array, new_array.type().subtype()), rhs.content()));

// A String has a field Object with @clsid = String and @lock = false:
symbolt jlo_symbol=symbol_table.lookup("java::java.lang.Object");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

dereference_exprt deref(rhs, deref_type);

// Fields of the string object
exprt rhs_length=get_length(deref, symbol_table);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const exprt &? Similar for next line.

code_blockt code;

// String expression that will hold the result
string_exprt string_expr1=fresh_string_expr(loc, symbol_table, code);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const string_exprt & ?

assert(arguments.size()==2);

// > string1 = arguments[1].toString()
exprt string1=fresh_string(this_obj.type(), loc, symbol_table);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const exprt &?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I now saw @romainbrenguier's comment that this will be split. So I'll look at it again once that is done.

@allredj
Copy link
Contributor

allredj commented Jun 8, 2017

@cristina-david Indeed this PR will be removed. I created #993 that partially replaces it.

@romainbrenguier
Copy link
Contributor Author

This pull request is replaced by #993, #994 and #1001 so I'm closing this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants