Skip to content

Adding preprocessing for functions of the Java Character library #640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

romainbrenguier
Copy link
Contributor

This preprocessing is meant to replace simple methods of the Character
Java in goto-programs by expressions on characters.
About 60% of the library is supported, but for some of this
methods are limited to ASCII characters.
This is meant to be run in conjunction with the string refinement so
it is called if the string-refine option is activated.
Note that the string-refine option is added in PR #429, but this PR can be merged independently.


class character_refine_preprocesst:public messaget
{
public:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No single-character indent, please (just don't indent public/private).


private:
namespacet ns;
goto_functionst & goto_functions;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No whitespace after &

exprt (*expr_function)(exprt, typet), conversion_input &target);
static exprt in_interval_expr(
exprt arg, mp_integer lower_bound, mp_integer upper_bound);
static exprt in_list_expr(exprt chr, std::list<mp_integer> list);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a list by value?

};

#endif // CPROVER_GOTO_PROGRAMS_CHARACTER_REFINE_PREPROCESS_H

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blank line at end-of-file, please.

typet type=result.type();
exprt u010000=from_integer(0x010000, type);
exprt u0800=from_integer(0x0800, type);
exprt u0400=from_integer(0x0400, type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments around those magic numbers would really be appreciated.

\*******************************************************************/

exprt character_refine_preprocesst::expr_of_is_whitespace(
exprt expr, typet type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather random as it applies across this PR: use (const) references wherever possible, both in parameters and within the functions.

@@ -20,7 +20,9 @@ SRC = goto_convert.cpp goto_convert_function_call.cpp \
slice_global_inits.cpp goto_inline_class.cpp class_identifier.cpp \
show_goto_functions_json.cpp \
show_goto_functions_xml.cpp \
remove_static_init_loops.cpp remove_instanceof.cpp
remove_static_init_loops.cpp remove_instanceof.cpp \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put each cpp file on a separate line to facilitate merging

Inputs:
target - a position in a goto program

Purpose: Converts function call to an assignement of an expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assignment

void character_refine_preprocesst::convert_is_valid_code_point(
conversion_input &target)
{
// TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this has not been implemented yet?

Inputs:
target - a position in a goto program

Purpose: Converts function call to an assignement of an expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assignment (several occurrences)

void character_refine_preprocesst::convert_get_directionality_char(
conversion_input &target)
{
// TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment on why this function is not implemented (several instances)

@romainbrenguier romainbrenguier force-pushed the java-character-library-support branch from e0cd4ee to b5cd96b Compare March 18, 2017 20:54
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor changes - many about the use of references.

@@ -904,6 +905,13 @@ bool cbmc_parse_optionst::process_goto_program(
status() << "Partial Inlining" << eom;
goto_partial_inline(goto_functions, ns, ui_message_handler);

if(cmdline.isset("string-refine"))
{
status() << "Preprocessing for java Character library" << eom;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kroening @peterschrammel @romainbrenguier I am wondering whether the invoked code should go in java_bytecode. (Yes, I have asked a similar question before.) If that folder isn't a good fit (for it introduces a dependency on goto-programs from java_bytecode), then maybe it's time for some folder like java_in-processing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right it should probably go to java_bytecode, the dependencies in goto-programs are not strong here. I will investigate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to the code which allows it to be called during java bytecode conversion.
So it can be moved there. See the commits 5801fbf , 848983b and 4c1986a

exprt arg=function_call.arguments()[0];
exprt result=function_call.lhs();
target->make_assignment();
typet type=result.type();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const typet &type=result.type(); If you delay the target->make_assignment() call until after code is constructed, you could even make all of the above const exprt &

\*******************************************************************/

exprt character_refine_preprocesst::in_interval_expr(
const exprt &chr, mp_integer lower_bound, mp_integer upper_bound)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass mp_integer by (const) reference

const exprt &chr, const std::list<mp_integer> &list)
{
exprt res=false_exprt();
for(auto i : list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto &i : list

{
exprt res=false_exprt();
for(auto i : list)
res=or_exprt(res, equal_exprt(chr, from_integer(i, chr.type())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use disjunction from util/std_expr.h to avoid a deeply nested expression.

expr - An expression of type character
type - A type for the output

Outputs: A character expression of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use references

expr - An expression of type character
type - A type for the output

Outputs: A character expression of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use references - see above

expr - An expression of type character
type - A type for the output

Outputs: A character expression of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this return; statement

expr - An expression of type character
type - A type for the output

Outputs: A character expression of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to catch or check additional function names (think about evolution of the Java SDK)? I'm thinking along the lines of "if the function call has prefix java::java.lang.Character." then it must be in the conversion_table or in some (to be created) conversion_not_supported table. Would that seem reasonable to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to detect functions that we do not support, but I don't think it should stop CBMC if we encounter one, considering its result non-deterministic maybe enough for what we want to check. We could raise a warning in that case, but it's actually already done by CBMC: we get messages like **** WARNING: no body for function java::java.lang.Character.toUpperCase:(C)C for the functions we haven't translated.
The reasonable thing in my opinion would be to update the code with the new versions of the JDK. From what I have seen, there will not be much difference with the JDK9 (the only difference I see is a new codePointOf(String) function which is the inverse of getName).

exprt (*expr_function)(const exprt &chr, const typet &type),
conversion_input &target);
static exprt in_interval_expr(
const exprt &chr, mp_integer lower_bound, mp_integer upper_bound);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some mixed indent of parameters - either seems ok, but consistency would be great.

@tautschnig
Copy link
Collaborator

With regard to the unsupported functions: won't the warning about missing function bodies continue to show up for all functions that are natively handled by the string solver?

@romainbrenguier romainbrenguier force-pushed the java-character-library-support branch 2 times, most recently from 636c274 to 848983b Compare March 21, 2017 15:06
@romainbrenguier
Copy link
Contributor Author

@tautschnig the functions that we support are converted to some fake functions calls (which are understood by the solver), and these don't show up in the warnings.
So we only see the non-supported functions in the warnings.

@@ -33,13 +33,15 @@ class java_bytecode_convert_methodt:public messaget
bool _disable_runtime_checks,
size_t _max_array_length,
safe_pointer<std::vector<irep_idt> > _needed_methods,
safe_pointer<std::set<irep_idt> > _needed_classes):
safe_pointer<std::set<irep_idt> > _needed_classes,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: with C++11 you don't need the space in > > anymore.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all that I can tell from just reading code: this looks really good. Thanks a lot for all the changes and refactoring!

@kroening
Copy link
Member

This needs a rebase!

This preprocessing is meant to replace simple methods of the Character
Java library by expressions on characters during bytecode conversion.
About 60% of the library is supported, but for some of this
methods are limited to ASCII characters.
This is meant to be run in conjunction with the string refinement so
it is called if the string-refine option is activated.
@romainbrenguier romainbrenguier force-pushed the java-character-library-support branch from 848983b to 924e01b Compare May 21, 2017 09:31
@romainbrenguier
Copy link
Contributor Author

@kroening this has been rebased.

@kroening kroening merged commit 9ec55a3 into diffblue:master May 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants