Skip to content

Avoid redundant computations in get_user_specified_clinit #5078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

romainbrenguier
Copy link
Contributor

Some computation like json parsing and iterating through the map were done at each call to get_user_specified_clinit which is called a lot of times for projects with large json file for static values.
With this pull request on such an example the execution time went from 10 seconds to 2 seconds (mostly thanks to removing the redundant json parsing).

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • [na] White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy map feels like it's apt to get out of date, but actually it isn't, since it only includes function and field symbols, all of which are produced early (in the convert-class phase). Suggest adding comments to the map and the lazy class clarifying that get_declaring_class doesn't include e.g. local variables, which are updated by convert_single_method and therefore would cause it to get out of date.

In the same comments clarify under what circumstances it must be purged and recalculated (i.e. if some transformation creates or deletes function or field symbols).

For example, should it be purged after lazy-methods erases a bunch of unused function symbols from the symbol table?

const std::string filename = options.get_option("static-values");
jsont tmp_json;
if(!parse_json(filename, *message_handler, tmp_json))
static_values_file = std::move(tmp_json);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_values_file -> static_values_json

if(!tmp_json.is_object())
{
warning() << "Provided JSON file for static-values is not a JSON "
<< "object, it will be ignored." << messaget::eom;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, -> ;

static_values_file = std::move(tmp_json);
if(parse_json(filename, *message_handler, tmp_json))
{
warning() << "Provided JSON file for static-values cannot be parsed, it"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, -> ;

@@ -769,17 +769,16 @@ code_ifthenelset get_clinit_wrapper_body(

code_blockt get_user_specified_clinit_body(
const irep_idt &class_id,
const json_objectt &json,
const json_objectt &static_values_file,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Param rename snuck into commit "Remove unused message handler"

@@ -217,7 +217,7 @@ void java_bytecode_languaget::set_language_options(const optionst &options)
<< "object, it will be ignored." << messaget::eom;
}
else
static_values_file = std::move(to_json_object(tmp_json));
static_values_json = std::move(to_json_object(tmp_json));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, this does get done. Suggest squash into earlier commit that changes the type.

@romainbrenguier romainbrenguier force-pushed the refactor/user-specified-clinit branch from 2f431d4 to 0d15f12 Compare September 3, 2019 13:40
@romainbrenguier
Copy link
Contributor Author

@smowton I've tried to clarify when the map could get outdated. Can you check the new doc?

@romainbrenguier romainbrenguier added the Needs TG approval 🦉 Only merge with explicit approval from test-gen maintainers label Sep 3, 2019
@romainbrenguier romainbrenguier force-pushed the refactor/user-specified-clinit branch 2 times, most recently from 44b45b7 to 5650d05 Compare September 4, 2019 13:34
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 5650d05).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/125909454
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@romainbrenguier romainbrenguier removed the Needs TG approval 🦉 Only merge with explicit approval from test-gen maintainers label Sep 5, 2019
Copy link
Contributor

@antlechner antlechner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -229,10 +229,10 @@ class java_bytecode_languaget:public languaget
java_string_library_preprocesst string_preprocess;
std::string java_cp_include_files;
bool nondet_static;
/// Path to a JSON file which contains initial values of static fields (right
/// JSON which contains initial values of static fields (right
/// after the static initializer of the class was run). This is read from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 read from the ... option -> read from the file specified by the ... option?

src/util/range.h Outdated
/// Utility function to make equal_range method of multimap easier to use by
/// returning a ranget object. For instance, we can write:
/// `for(auto value : equal_range(map, key).filter(...).map(...)) {...}`.
template <typename Mapt>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ It looks like the convention in this file is to write typenames in small letters, e.g. typename containert above?

for(const auto &symbol_pair : symbol_table)
const auto class_to_declared_symbols_map =
class_to_declared_symbols(symbol_table);
for(const auto &symbol_pair :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 symbol_pair isn't such a good name for this any more. Maybe class_symbol_pair?

/// would make later calls return an outdated map. The
/// lazy_class_to_declared_symbols_mapt would then need to be reinitialized.
/// Similarly if some transformation creates or deletes function or field
/// symbols the map would get out of date and would need to be reinitialized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Could specify "creates or deletes function or field symbols in the symbol table"

@@ -1131,10 +1153,13 @@ static void notify_static_method_calls(
/// \param symbol_table: global symbol table
/// \param needed_lazy_methods: optionally a collection of needed methods to
/// update with any methods touched during the conversion
/// \param class_to_declared_symbols: map classes to the symbols that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ map -> maps

@@ -201,13 +225,18 @@ class java_bytecode_languaget:public languaget
const irep_idt &function_id,
symbol_table_baset &symbol_table)
{
lazy_class_to_declared_symbols_mapt class_to_declared_symbols{};
Copy link
Contributor

@antlechner antlechner Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Why does this method create its own lazy_class_to_declared_symbols_mapt? Could this cause any problems or inconsistencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No inconsistency because if it is redeclared everytime this preserves the old behaviour. But yes it's not optimal I will make sure it is declared by the function which calls that one on each methods.

@romainbrenguier romainbrenguier force-pushed the refactor/user-specified-clinit branch 2 times, most recently from 5450149 to dac3e16 Compare September 6, 2019 07:08
@codecov-io
Copy link

codecov-io commented Sep 6, 2019

Codecov Report

Merging #5078 into develop will decrease coverage by <.01%.
The diff coverage is 91.3%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5078      +/-   ##
===========================================
- Coverage    69.66%   69.66%   -0.01%     
===========================================
  Files         1319     1319              
  Lines       109234   109250      +16     
===========================================
+ Hits         76102    76106       +4     
- Misses       33132    33144      +12
Impacted Files Coverage Δ
jbmc/src/java_bytecode/java_static_initializers.h 100% <ø> (ø) ⬆️
src/util/range.h 100% <ø> (ø) ⬆️
...bmc/src/java_bytecode/java_static_initializers.cpp 98.2% <100%> (+0.42%) ⬆️
jbmc/src/java_bytecode/java_bytecode_language.h 76.92% <100%> (+4.19%) ⬆️
jbmc/src/java_bytecode/java_bytecode_language.cpp 93.08% <80.95%> (-0.67%) ⬇️
src/util/symbol_table_builder.h 65.21% <0%> (-17.4%) ⬇️
src/util/journalling_symbol_table.h 78.94% <0%> (-10.53%) ⬇️
src/util/symbol_table_base.h 88.46% <0%> (-3.85%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc0a0cd...30bc2b6. Read the comment docs.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 5450149).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/126187118
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

No functional changes.
This is so that we can avoid recomputing it in each loop.
This avoids reparsing the whole json file for each class that needs a clinit.
Locally, on an example with a large json provided for --static-values,
execution time went from 10s to 2s.
There is no reason to delay this check, and we can provide useful
warning messages right at the start of the bytecode analysis.
No user specified static initializer should appear if the static-values
is not set anyway. So this if condition can be replaced by an invariant
at the call site.
This is no longer used now that the JSON parsing is done outside this function.
This is to make the equal_range method of multimap easier to use.
The usage is illustrated in this commit by an example in
java_static_initializers which allows use to use range-for.
This will be also useful in following commits.
This is an intermediary commit towards computing this map outside of the
get_user_specified_clinit_body function, so that we can avoid going
through the whole symbol map at each get_user_specified_clinit_body call.
This is an intermediary commit that gets us closer to avoiding
recomputing this map everytime.
This is to avoid computing the map several times and make sure it is
never computed if it is not actually needed.
@romainbrenguier romainbrenguier force-pushed the refactor/user-specified-clinit branch from dac3e16 to 30bc2b6 Compare September 6, 2019 08:17
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 30bc2b6).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/126195214

@romainbrenguier romainbrenguier merged commit 78fc2b7 into diffblue:develop Sep 6, 2019
@romainbrenguier romainbrenguier deleted the refactor/user-specified-clinit branch September 6, 2019 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants