-
Notifications
You must be signed in to change notification settings - Fork 274
Adding preprocessing for functions of the Java Character library #640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding preprocessing for functions of the Java Character library #640
Conversation
|
||
class character_refine_preprocesst:public messaget | ||
{ | ||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No single-character indent, please (just don't indent public/private).
|
||
private: | ||
namespacet ns; | ||
goto_functionst & goto_functions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No whitespace after &
exprt (*expr_function)(exprt, typet), conversion_input &target); | ||
static exprt in_interval_expr( | ||
exprt arg, mp_integer lower_bound, mp_integer upper_bound); | ||
static exprt in_list_expr(exprt chr, std::list<mp_integer> list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing a list by value?
}; | ||
|
||
#endif // CPROVER_GOTO_PROGRAMS_CHARACTER_REFINE_PREPROCESS_H | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blank line at end-of-file, please.
typet type=result.type(); | ||
exprt u010000=from_integer(0x010000, type); | ||
exprt u0800=from_integer(0x0800, type); | ||
exprt u0400=from_integer(0x0400, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments around those magic numbers would really be appreciated.
\*******************************************************************/ | ||
|
||
exprt character_refine_preprocesst::expr_of_is_whitespace( | ||
exprt expr, typet type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather random as it applies across this PR: use (const) references wherever possible, both in parameters and within the functions.
src/goto-programs/Makefile
Outdated
@@ -20,7 +20,9 @@ SRC = goto_convert.cpp goto_convert_function_call.cpp \ | |||
slice_global_inits.cpp goto_inline_class.cpp class_identifier.cpp \ | |||
show_goto_functions_json.cpp \ | |||
show_goto_functions_xml.cpp \ | |||
remove_static_init_loops.cpp remove_instanceof.cpp | |||
remove_static_init_loops.cpp remove_instanceof.cpp \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put each cpp file on a separate line to facilitate merging
Inputs: | ||
target - a position in a goto program | ||
|
||
Purpose: Converts function call to an assignement of an expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assignment
void character_refine_preprocesst::convert_is_valid_code_point( | ||
conversion_input &target) | ||
{ | ||
// TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this has not been implemented yet?
Inputs: | ||
target - a position in a goto program | ||
|
||
Purpose: Converts function call to an assignement of an expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assignment (several occurrences)
void character_refine_preprocesst::convert_get_directionality_char( | ||
conversion_input &target) | ||
{ | ||
// TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please comment on why this function is not implemented (several instances)
e0cd4ee
to
b5cd96b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor changes - many about the use of references.
src/cbmc/cbmc_parse_options.cpp
Outdated
@@ -904,6 +905,13 @@ bool cbmc_parse_optionst::process_goto_program( | |||
status() << "Partial Inlining" << eom; | |||
goto_partial_inline(goto_functions, ns, ui_message_handler); | |||
|
|||
if(cmdline.isset("string-refine")) | |||
{ | |||
status() << "Preprocessing for java Character library" << eom; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kroening @peterschrammel @romainbrenguier I am wondering whether the invoked code should go in java_bytecode. (Yes, I have asked a similar question before.) If that folder isn't a good fit (for it introduces a dependency on goto-programs from java_bytecode), then maybe it's time for some folder like java_in-processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right it should probably go to java_bytecode
, the dependencies in goto-programs
are not strong here. I will investigate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes to the code which allows it to be called during java bytecode conversion.
So it can be moved there. See the commits 5801fbf , 848983b and 4c1986a
exprt arg=function_call.arguments()[0]; | ||
exprt result=function_call.lhs(); | ||
target->make_assignment(); | ||
typet type=result.type(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const typet &type=result.type();
If you delay the target->make_assignment()
call until after code
is constructed, you could even make all of the above const exprt &
\*******************************************************************/ | ||
|
||
exprt character_refine_preprocesst::in_interval_expr( | ||
const exprt &chr, mp_integer lower_bound, mp_integer upper_bound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass mp_integer by (const) reference
const exprt &chr, const std::list<mp_integer> &list) | ||
{ | ||
exprt res=false_exprt(); | ||
for(auto i : list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto &i : list
{ | ||
exprt res=false_exprt(); | ||
for(auto i : list) | ||
res=or_exprt(res, equal_exprt(chr, from_integer(i, chr.type()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use disjunction
from util/std_expr.h to avoid a deeply nested expression.
expr - An expression of type character | ||
type - A type for the output | ||
|
||
Outputs: A character expression of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use references
expr - An expression of type character | ||
type - A type for the output | ||
|
||
Outputs: A character expression of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use references - see above
expr - An expression of type character | ||
type - A type for the output | ||
|
||
Outputs: A character expression of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this return;
statement
expr - An expression of type character | ||
type - A type for the output | ||
|
||
Outputs: A character expression of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to catch or check additional function names (think about evolution of the Java SDK)? I'm thinking along the lines of "if the function call has prefix java::java.lang.Character." then it must be in the conversion_table
or in some (to be created) conversion_not_supported
table. Would that seem reasonable to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to detect functions that we do not support, but I don't think it should stop CBMC if we encounter one, considering its result non-deterministic maybe enough for what we want to check. We could raise a warning in that case, but it's actually already done by CBMC: we get messages like **** WARNING: no body for function java::java.lang.Character.toUpperCase:(C)C
for the functions we haven't translated.
The reasonable thing in my opinion would be to update the code with the new versions of the JDK. From what I have seen, there will not be much difference with the JDK9 (the only difference I see is a new codePointOf(String)
function which is the inverse of getName
).
exprt (*expr_function)(const exprt &chr, const typet &type), | ||
conversion_input &target); | ||
static exprt in_interval_expr( | ||
const exprt &chr, mp_integer lower_bound, mp_integer upper_bound); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some mixed indent of parameters - either seems ok, but consistency would be great.
With regard to the unsupported functions: won't the warning about missing function bodies continue to show up for all functions that are natively handled by the string solver? |
636c274
to
848983b
Compare
@tautschnig the functions that we support are converted to some fake functions calls (which are understood by the solver), and these don't show up in the warnings. |
@@ -33,13 +33,15 @@ class java_bytecode_convert_methodt:public messaget | |||
bool _disable_runtime_checks, | |||
size_t _max_array_length, | |||
safe_pointer<std::vector<irep_idt> > _needed_methods, | |||
safe_pointer<std::set<irep_idt> > _needed_classes): | |||
safe_pointer<std::set<irep_idt> > _needed_classes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: with C++11 you don't need the space in > >
anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all that I can tell from just reading code: this looks really good. Thanks a lot for all the changes and refactoring!
This needs a rebase! |
This preprocessing is meant to replace simple methods of the Character Java library by expressions on characters during bytecode conversion. About 60% of the library is supported, but for some of this methods are limited to ASCII characters. This is meant to be run in conjunction with the string refinement so it is called if the string-refine option is activated.
848983b
to
924e01b
Compare
@kroening this has been rebased. |
This preprocessing is meant to replace simple methods of the Character
Java in goto-programs by expressions on characters.
About 60% of the library is supported, but for some of this
methods are limited to ASCII characters.
This is meant to be run in conjunction with the string refinement so
it is called if the string-refine option is activated.
Note that the string-refine option is added in PR #429, but this PR can be merged independently.