Skip to content

Don't use temporaries in remove_instanceof #4967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Jul 31, 2019

This replaces "if(x == null) result = false; else { tmp = x->@clsid; result = (tmp == "A" || tmp == "B" || ...)
with (x != null && (x->@clsid == "A" || x->@clsid == "B" || ...))

Advantages:

  • The ASSERT/ASSUME pair is more obvious when expressed as a single expression rather than a small if-block, so when asserts are converted to assumes (e.g. for coverage testing) then the redundant assume can be spotted and removed.
  • By operating directly on a pointer rather than using a temporary (x->@clsid == "A" rather than id = x->@clsid; id == "A") we enable symex's value-set filtering to discard no-longer-viable candidates for "x" (in the case where 'x' is a symbol, not a general expression like x->y->@clsid). This means there are more clean dereference operations (ones where there is no possibility of reading a null, invalid or incorrectly-typed object), and less pessimistic aliasing assumptions possibly causing goto-symex to explore in-fact-unreachable paths.
  • By not creating a small if-block for every instanceof operation, we spare symex from a fork/join sequence with attendant copying of data structures

Disadvantage:

  • By not using a temporary for the class-identifier, we force symex to carry out the dereference multiple times (i.e. x->@clsid == "A" || x->@clsid == "B" will become (x == &o1 ? o1 : o2).@clsid == "A" || (x == &o1 ? o1 : o2).@clsid == "B"). This could be expensive if either the alias set or the list of allowable types is long.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: d34916f).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/121427927
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@codecov-io
Copy link

codecov-io commented Jul 31, 2019

Codecov Report

Merging #4967 into develop will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4967      +/-   ##
===========================================
- Coverage    69.31%    69.3%   -0.01%     
===========================================
  Files         1308     1308              
  Lines       108315   108305      -10     
===========================================
- Hits         75075    75065      -10     
  Misses       33240    33240
Impacted Files Coverage Δ
jbmc/src/java_bytecode/remove_instanceof.cpp 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 639b864...d34916f. Read the comment docs.

@smowton
Copy link
Contributor Author

smowton commented Aug 8, 2019

SVCOMP evaluation: one test made faster (goes from timing out to completing in 15s), otherwise little change. Mostly no change expected for this PR, as it will only affect accesses made after a cast but before the instance type becomes clear due to a virtual function call or similar.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and amazingly well explained.
An improvement on one SV-COMP example is good, but it would be more reassuring to try this out with TG on some real-world benchmarks and see the effect there.

This replaces "if(x == null) result = false; else { tmp = x->@Clsid; result = (tmp == "A" || tmp == "B" || ...)
with (x != null && (x->@Clsid == "A" || x->@Clsid == "B" || ...))

Advantages:
* The ASSERT/ASSUME pair is more obvious when expressed as a single expression rather than a small if-block,
  so when asserts are converted to assumes (e.g. for coverage testing) then the redundant assume can be
  spotted and removed.
* By operating directly on a pointer rather than using a temporary (x->@Clsid == "A" rather than id = x->@Clsid; id == "A")
  we enable symex's value-set filtering to discard no-longer-viable candidates for "x" (in the case where 'x' is a symbol,
  not a general expression like x->y->@Clsid). This means there are more clean dereference operations (ones where there
  is no possibility of reading a null, invalid or incorrectly-typed object), and less pessimistic aliasing assumptions
  possibly causing goto-symex to explore in-fact-unreachable paths.
* By not creating a small if-block for every instanceof operation, we spare symex from a fork/join sequence with attendant
  copying of data structures

Disadvantage:
* By not using a temporary for the class-identifier, we force symex to carry out the dereference multiple times (i.e.
  x->@Clsid == "A" || x->@Clsid == "B" will become (x == &o1 ? o1 : o2).@Clsid == "A" || (x == &o1 ? o1 : o2).@Clsid == "B")
  This could be expensive if either the alias set or the list of allowable types is long.
@smowton smowton force-pushed the smowton/feature/straightforward-remove-instanceof branch from d34916f to 581540c Compare August 9, 2019 12:01
@smowton
Copy link
Contributor Author

smowton commented Aug 9, 2019

@romainbrenguier @JohnDumbell would either of you mind trying your benchmarks on this?

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 581540c).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/122594097
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@JohnDumbell
Copy link
Contributor

JohnDumbell commented Aug 9, 2019

Ran my benchmark against it. Lot of methods on Tika it helped, one outlier that added about a second but reductions on others far outweigh that.

Ran one run with the change, another with the commit reverted, so otherwise it was exactly the same.

@smowton
Copy link
Contributor Author

smowton commented Aug 9, 2019

Awesome! Suggestion: tweak your benchmark output so the summary data is easily paste-able? For example, X methods run, Y timeouts, Z total cpu time, ...

@JohnDumbell
Copy link
Contributor

That json was always intended to be consumed by something else to make a nice shiny report, but if I don't get around to it I'll just add a summary like that somewhere.

@smowton smowton merged commit bcaf5f6 into diffblue:develop Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants