Skip to content

[TG-3813] Allow specifying specific methods by regex to be loaded by lazy methods #2350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions jbmc/regression/jbmc/lazyloading10/test.desc
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
CORE symex-driven-lazy-loading-expected-failure
test.class
--lazy-methods --verbosity 10 --function test.f --lazy-methods-extra-entry-point test.sety
^EXIT=6$
^SIGNAL=0$
entry point 'test\.sety' is ambiguous between:
test\.sety:\(I\)V
test\.sety:\(F\)V
CI lazy methods: elaborate java::test\.sety:\(I\)V
CI lazy methods: elaborate java::test\.sety:\(F\)V
--
--
This doesn't work under symex-driven lazy loading because it is incompatible with --lazy-methods
2 changes: 1 addition & 1 deletion jbmc/regression/jbmc/lazyloading11/test.desc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
CORE symex-driven-lazy-loading-expected-failure
test.class
--lazy-methods --verbosity 10 --function test.f --lazy-methods-extra-entry-point 'test.sety:(I)V' --lazy-methods-extra-entry-point 'test.sety:(F)V'
--lazy-methods --verbosity 10 --function test.f --lazy-methods-extra-entry-point 'test.sety:\(I\)V' --lazy-methods-extra-entry-point 'test.sety:\(F\)V'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this is needed in the quotes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the escaping the brackets? Yes as otherwise it will be used as a regex group rather than matching the brackets (this is essentially what is changing).

^EXIT=0$
^SIGNAL=0$
VERIFICATION SUCCESSFUL
Expand Down
2 changes: 1 addition & 1 deletion jbmc/regression/jbmc/lazyloading7/test.desc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
CORE symex-driven-lazy-loading-expected-failure
test.class
--lazy-methods --verbosity 10 --function test.f --lazy-methods-extra-entry-point 'test.sety:(I)V'
--lazy-methods --verbosity 10 --function test.f --lazy-methods-extra-entry-point 'test.sety:\(I\)V'
^EXIT=0$
^SIGNAL=0$
VERIFICATION SUCCESSFUL
Expand Down
2 changes: 1 addition & 1 deletion jbmc/regression/jbmc/lazyloading8/test.desc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
CORE symex-driven-lazy-loading-expected-failure
test.class
--lazy-methods --verbosity 10 --function test.f --lazy-methods-extra-entry-point 'test.sety:(F)V'
--lazy-methods --verbosity 10 --function test.f --lazy-methods-extra-entry-point 'test.sety:\(F\)V'
^EXIT=0$
^SIGNAL=0$
VERIFICATION SUCCESSFUL
Expand Down
1 change: 1 addition & 0 deletions jbmc/src/java_bytecode/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ SRC = bytecode_info.cpp \
java_string_literals.cpp \
java_types.cpp \
java_utils.cpp \
load_method_by_regex.cpp \
mz_zip_archive.cpp \
remove_exceptions.cpp \
remove_instanceof.cpp \
Expand Down
59 changes: 8 additions & 51 deletions jbmc/src/java_bytecode/ci_lazy_methods.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ ci_lazy_methodst::ci_lazy_methodst(
const symbol_tablet &symbol_table,
const irep_idt &main_class,
const std::vector<irep_idt> &main_jar_classes,
const std::vector<irep_idt> &lazy_methods_extra_entry_points,
const std::vector<load_extra_methodst> &lazy_methods_extra_entry_points,
java_class_loadert &java_class_loader,
const std::vector<irep_idt> &extra_instantiated_classes,
const select_pointer_typet &pointer_type_selector,
Expand Down Expand Up @@ -103,8 +103,13 @@ bool ci_lazy_methodst::operator()(

// Add any extra entry points specified; we should elaborate these in the
// same way as the main function.
std::vector<irep_idt> extra_entry_points=lazy_methods_extra_entry_points;
resolve_method_names(extra_entry_points, symbol_table);
std::vector<irep_idt> extra_entry_points;
for(const auto &extra_function_generator : lazy_methods_extra_entry_points)
{
const auto &extra_methods = extra_function_generator(symbol_table);
extra_entry_points.insert(
extra_entry_points.end(), extra_methods.begin(), extra_methods.end());
}
methods_to_convert_later.insert(
extra_entry_points.begin(), extra_entry_points.end());

Expand Down Expand Up @@ -355,54 +360,6 @@ ci_lazy_methodst::entry_point_methods(const symbol_tablet &symbol_table)
return methods_to_convert_later;
}

/// Translates the given list of method names from human-readable to
/// internal syntax.
/// Expands any wildcards (entries ending in '.*') in the given method
/// list to include all non-static methods defined on the given class.
/// \param [in, out] methods: List of methods to expand. Any wildcard entries
/// will be deleted and the expanded entries appended to the end.
/// \param symbol_table: global symbol table
void ci_lazy_methodst::resolve_method_names(
std::vector<irep_idt> &methods,
const symbol_tablet &symbol_table)
{
std::vector<irep_idt> new_methods;
for(const irep_idt &method : methods)
{
const std::string &method_str=id2string(method);
if(!has_suffix(method_str, ".*"))
{
std::string error_message;
irep_idt internal_name=
resolve_friendly_method_name(
method_str,
symbol_table,
error_message);
if(internal_name==irep_idt())
throw "entry point "+error_message;
new_methods.push_back(internal_name);
}
else
{
irep_idt classname="java::"+method_str.substr(0, method_str.length()-2);
if(!symbol_table.has_symbol(classname))
throw "wildcard entry point '"+method_str+"': unknown class";

for(const auto &name_symbol : symbol_table.symbols)
{
if(name_symbol.second.type.id()!=ID_code)
continue;
if(!to_code_type(name_symbol.second.type).has_this())
continue;
if(has_prefix(id2string(name_symbol.first), id2string(classname)))
new_methods.push_back(name_symbol.first);
}
}
}

methods=std::move(new_methods);
}

/// Build up a list of methods whose type may be passed around reachable
/// from the entry point.
/// \param entry_points: list of fully-qualified function names that
Expand Down
11 changes: 5 additions & 6 deletions jbmc/src/java_bytecode/ci_lazy_methods.h
Original file line number Diff line number Diff line change
Expand Up @@ -91,14 +91,17 @@ typedef std::function<
bool(const irep_idt &function_id, ci_lazy_methods_neededt)>
method_convertert;

typedef std::function<std::vector<irep_idt>(const symbol_tablet &)>
load_extra_methodst;

class ci_lazy_methodst:public messaget
{
public:
ci_lazy_methodst(
const symbol_tablet &symbol_table,
const irep_idt &main_class,
const std::vector<irep_idt> &main_jar_classes,
const std::vector<irep_idt> &lazy_methods_extra_entry_points,
const std::vector<load_extra_methodst> &lazy_methods_extra_entry_points,
java_class_loadert &java_class_loader,
const std::vector<irep_idt> &extra_instantiated_classes,
const select_pointer_typet &pointer_type_selector,
Expand All @@ -112,10 +115,6 @@ class ci_lazy_methodst:public messaget
const method_convertert &method_converter);

private:
void resolve_method_names(
std::vector<irep_idt> &methods,
const symbol_tablet &symbol_table);

void initialize_instantiated_classes(
const std::unordered_set<irep_idt> &entry_points,
const namespacet &ns,
Expand Down Expand Up @@ -149,7 +148,7 @@ class ci_lazy_methodst:public messaget
class_hierarchyt class_hierarchy;
irep_idt main_class;
std::vector<irep_idt> main_jar_classes;
std::vector<irep_idt> lazy_methods_extra_entry_points;
const std::vector<load_extra_methodst> &lazy_methods_extra_entry_points;
java_class_loadert &java_class_loader;
const std::vector<irep_idt> &extra_instantiated_classes;
const select_pointer_typet &pointer_type_selector;
Expand Down
10 changes: 6 additions & 4 deletions jbmc/src/java_bytecode/java_bytecode_language.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Author: Daniel Kroening, [email protected]
#include "ci_lazy_methods.h"

#include "expr2java.h"
#include "load_method_by_regex.h"

/// Consume options that are java bytecode specific.
/// \param Command:line options
Expand Down Expand Up @@ -94,10 +95,11 @@ void java_bytecode_languaget::get_language_options(const cmdlinet &cmd)

const std::list<std::string> &extra_entry_points=
cmd.get_values("lazy-methods-extra-entry-point");
lazy_methods_extra_entry_points.insert(
lazy_methods_extra_entry_points.end(),
std::transform(
extra_entry_points.begin(),
extra_entry_points.end());
extra_entry_points.end(),
std::back_inserter(extra_methods),
build_load_method_by_regex);

if(cmd.isset("java-cp-include-files"))
{
Expand Down Expand Up @@ -815,7 +817,7 @@ bool java_bytecode_languaget::do_ci_lazy_method_conversion(
symbol_table,
main_class,
main_jar_classes,
lazy_methods_extra_entry_points,
extra_methods,
java_class_loader,
java_load_classes,
get_pointer_type_selector(),
Expand Down
8 changes: 6 additions & 2 deletions jbmc/src/java_bytecode/java_bytecode_language.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,10 @@ Author: Daniel Kroening, [email protected]
" --lazy-methods-extra-entry-point METHODNAME\n" /* NOLINT(*) */ \
" treat METHODNAME as a possible program entry point for\n" /* NOLINT(*) */ \
" the purpose of lazy method loading\n" /* NOLINT(*) */ \
" A '.*' wildcard is allowed to specify all class members\n"
" METHODNAME can be a regex that will be matched against\n" /* NOLINT(*) */ \
" all symbols. If missing a java:: prefix will be added\n" /* NOLINT(*) */ \
" If no descriptor is found, all overloads of a method will\n"/* NOLINT(*) */ \
" also be added." /* NOLINT(*) */
// clang-format on

class symbolt;
Expand Down Expand Up @@ -173,7 +176,6 @@ class java_bytecode_languaget:public languaget
size_t max_user_array_length; // max size for user code created arrays
method_bytecodet method_bytecode;
lazy_methods_modet lazy_methods_mode;
std::vector<irep_idt> lazy_methods_extra_entry_points;
bool string_refinement_enabled;
bool throw_runtime_exceptions;
bool assert_uncaught_exceptions;
Expand All @@ -195,6 +197,8 @@ class java_bytecode_languaget:public languaget
class_hierarchyt class_hierarchy;
// List of classes to never load
std::unordered_set<std::string> no_load_classes;

std::vector<load_extra_methodst> extra_methods;
};

std::unique_ptr<languaget> new_java_bytecode_language();
Expand Down
74 changes: 74 additions & 0 deletions jbmc/src/java_bytecode/load_method_by_regex.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs adding to a Makefile

/*******************************************************************\

Module: Java Bytecode

Author: Diffblue Ltd.

\*******************************************************************/

#include "load_method_by_regex.h"

#include <regex>

#include <util/symbol_table.h>

/// For a given user provided pattern, return a regex, having dealt with the
/// cases where the user has not prefixed with java:: or suffixed with the
/// descriptor
/// \param pattern: The user provided pattern
/// \return The regex to match with
static std::regex build_regex_from_pattern(const std::string &pattern)
{
std::string modified_pattern = pattern;
if(does_pattern_miss_descriptor(pattern))
modified_pattern += R"(:\(.*\).*)";

if(!has_prefix(pattern, "java::"))
modified_pattern = "java::" + modified_pattern;

return std::regex{modified_pattern};
}

/// Identify if a parameter includes a part that will match a descriptor. That
/// is, does it have a colon separtor.
/// \param pattern: The user provided pattern
/// \return True if no descriptor is found (that is, the only : relates to the
/// java:: prefix.
bool does_pattern_miss_descriptor(const std::string &pattern)
{
const size_t descriptor_index = pattern.rfind(':');
if(descriptor_index == std::string::npos)
return true;

static const std::string java_prefix = "java::";
return descriptor_index == java_prefix.length() - 1 &&
has_prefix(pattern, java_prefix);
}

/// Create a lambda that returns the symbols that the given pattern should be
/// loaded.If the pattern doesn't include a colon for matching the descriptor,
/// append a `:\(.*\).*` to the regex. Note this will mean all overloaded
/// methods will be marked as extra entry points for CI lazy loading.
/// If the pattern doesn't include the java:: prefix, prefix that
/// \param pattern: The user provided pattern
/// \return The lambda to execute.
std::function<std::vector<irep_idt>(const symbol_tablet &symbol_table)>
build_load_method_by_regex(const std::string &pattern)
{
std::regex regex = build_regex_from_pattern(pattern);

return [=](const symbol_tablet &symbol_table) {
std::vector<irep_idt> matched_methods;
for(const auto &symbol : symbol_table.symbols)
{
if(
symbol.second.is_function() &&
std::regex_match(id2string(symbol.first), regex))
{
matched_methods.push_back(symbol.first);
}
}
return matched_methods;
};
}
25 changes: 25 additions & 0 deletions jbmc/src/java_bytecode/load_method_by_regex.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*******************************************************************\

Module: Java Bytecode

Author: Diffblue Ltd.

\*******************************************************************/

/// \file
/// Process a pattern to use as a regex for selecting extra entry points for
/// ci_lazy_methodst

#ifndef CPROVER_JAVA_BYTECODE_LOAD_METHOD_BY_REGEX_H
#define CPROVER_JAVA_BYTECODE_LOAD_METHOD_BY_REGEX_H

#include <java_bytecode/ci_lazy_methods.h>

class symbol_tablet;

std::function<std::vector<irep_idt>(const symbol_tablet &symbol_table)>
build_load_method_by_regex(const std::string &pattern);

bool does_pattern_miss_descriptor(const std::string &pattern);

#endif // CPROVER_JAVA_BYTECODE_LOAD_METHOD_BY_REGEX_H
1 change: 1 addition & 0 deletions jbmc/unit/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ SRC += java_bytecode/goto-programs/class_hierarchy_output.cpp \
java_bytecode/java_types/java_generic_symbol_type.cpp \
java_bytecode/java_types/java_type_from_string.cpp \
java_bytecode/java_utils_test.cpp \
java_bytecode/load_method_by_regex.cpp \
java_bytecode/inherited_static_fields/inherited_static_fields.cpp \
pointer-analysis/custom_value_set_analysis.cpp \
solvers/refinement/string_constraint_instantiation/instantiate_not_contains.cpp \
Expand Down
Loading