-
Notifications
You must be signed in to change notification settings - Fork 273
Improvements in string preprocessing to facilitate constant propagation #984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements in string preprocessing to facilitate constant propagation #984
Conversation
This is a new implementation of the string preprocessing for Java functions. Which now takes place during java bytecode processing instead of being a step in the goto-program treatment. Rename code_of_function to code_for_function Update documentation Minor corrections on the code Make all methods non-static This is to allow using class variables in them. The conversion_functiont map had to be adapted. Use class variable for refined string type Instead of recreating a new refined string type each time it is requested, we add a field to hold it in java_string_library_preprocess. Squash in Replacement Replace code.copy_to_operands() with code.add() Correct implementation and usage of string_length_type()
…onversion Removing add_string_type declaration, not used anymore
Removing string_refine_preprocess from Makefile Removing include of string_refine_preprocess
Only one instance of this class should exist for a program.
We now do this at the end of the bytecode conversion, in function java_bytecode_languaget::final instead of the middle of the conversion.
String classes that are present in a loaded model will generate a type that may be different from the internal type constructed for Java strings in cbmc. Thus we now create a String type and write it in the table, possibly overwriting a pre-existing Java string type.
…e/test-gen#159 We now have a special way of dealing with contains in the case the argument is a constant, which seems to be much more efficient.
Adding constraints to make sure the index_of primitive of the string solver is precise. Also add specific constraints for constant argument in indexOf to improve performances in that case.
…hat they also get concretized
We now assign strings in one step instead of field by field. This makes constant propagation work better in later step of the analysis.
This adds a substring of the second argument. This will make the encoding of StringBuilder.append(CharSequence s, int start, int end) more efficient.
Ping me when deps merged? |
Needs a rebase |
Actually I think we will split this pull request into several smaller ones. Once this is done we will close this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty long so I was wondering if there is any way it can be split? For instance, the axioms for constant strings in string_constraint_generator_main.cpp
appear in a different PR #828. Should this be rebased?
|
||
// Replacing call if it is a function of the Character library, | ||
// returning the same call otherwise | ||
c=string_preprocess.replace_character_call(call); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this here and do all the instrumentation in java_bytecode_language.cpp in a separate pass?
{ | ||
symbolt sym=symbol_table.lookup("java::java.lang.String"); | ||
typet concrete_type=sym.type; | ||
// TODO: Check that this is indeed the 'length' field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be implemented?
|
||
// TODO: Manage the case where we have a non-String Object (this should | ||
// probably be handled upstream. At any rate, the following code should be | ||
// protected with assertions on the type of op1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this comment be implemented?
|
||
\*******************************************************************/ | ||
|
||
typet java_string_library_preprocesst::get_length_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this method and the above one java_string_library_preprocesst::get_data_type
be combined into one by adding an extra argument? They seem almost identical otherwise.
std::string fun_name_data=id2string(function_name)+"_data"; | ||
|
||
// Assignments | ||
codet assign_fun_length=code_assign_function_application( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const codet &
?
code_blockt code; | ||
|
||
// A String has a field Object with @clsid = String and @lock = false: | ||
symbolt jlo_symbol=symbol_table.lookup("java::java.lang.Object"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const symbolt &jlo_symbol
?
dereference_exprt(new_array, new_array.type().subtype()), rhs.content())); | ||
|
||
// A String has a field Object with @clsid = String and @lock = false: | ||
symbolt jlo_symbol=symbol_table.lookup("java::java.lang.Object"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
dereference_exprt deref(rhs, deref_type); | ||
|
||
// Fields of the string object | ||
exprt rhs_length=get_length(deref, symbol_table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const exprt &
? Similar for next line.
code_blockt code; | ||
|
||
// String expression that will hold the result | ||
string_exprt string_expr1=fresh_string_expr(loc, symbol_table, code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const string_exprt &
?
assert(arguments.size()==2); | ||
|
||
// > string1 = arguments[1].toString() | ||
exprt string1=fresh_string(this_obj.type(), loc, symbol_table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const exprt &
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I now saw @romainbrenguier's comment that this will be split. So I'll look at it again once that is done.
@cristina-david Indeed this PR will be removed. I created #993 that partially replaces it. |
This make string assignments take place in one step instead of field by field which makes constant propagation easier.
This also adds concatenation with start_index and end_index arguments in the solver.
This depends on pull requests #983 #980 #978 #906and is part of the implementation of https://github.com/diffblue/test-gen/issues/256