Skip to content

Symex-dereference: simplify after deref [blocks: #2574, #4056] #3725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 7, 2019

Conversation

tautschnig
Copy link
Collaborator

@tautschnig tautschnig commented Jan 8, 2019

This is generally useful because it turns member-of-if into if-of-member, which
avoids repeatedly quoting large complex if expressions in a with_exprt, but it
is especially good for field sensitivity as it puts member expressions directly
on top of symbol expressions, which field sensitivity knows how to deal with.

Performance evaluation is required , but also this change shows that some tests newly fail with this change, which is unexpected. See #3724 for the first one of them.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Jan 8, 2019
@tautschnig tautschnig changed the title Symex-dereference: simplify after deref Symex-dereference: simplify after deref [depends-on: #3724] Jan 24, 2019
@tautschnig tautschnig force-pushed the simplify-after-deref branch from a0025af to 6f940ce Compare January 25, 2019 20:00
@tautschnig tautschnig changed the title Symex-dereference: simplify after deref [depends-on: #3724] Symex-dereference: simplify after deref [depends-on: #3724, #3953] Jan 25, 2019
@tautschnig tautschnig changed the title Symex-dereference: simplify after deref [depends-on: #3724, #3953] Symex-dereference: simplify after deref [depends-on: #3953] Jan 28, 2019
tautschnig added a commit that referenced this pull request Jan 28, 2019
Quantifier test: The second assertion does not hold [blocks: #2574, #3725, #3924]
tautschnig added a commit that referenced this pull request Jan 30, 2019
Do not misuse symex dereferencing code for finding array objects [blocks: #3725]
@tautschnig tautschnig force-pushed the simplify-after-deref branch from 6f940ce to 743e988 Compare January 30, 2019 14:16
@tautschnig tautschnig added Needs data This PR claims improvements that require further data to substantiate the claims. and removed dependent - do not merge work in progress labels Jan 30, 2019
@tautschnig tautschnig changed the title Symex-dereference: simplify after deref [depends-on: #3953] Symex-dereference: simplify after deref Jan 30, 2019
@tautschnig tautschnig force-pushed the simplify-after-deref branch 2 times, most recently from fd0f92d to 95ae2b9 Compare February 5, 2019 20:58
@tautschnig tautschnig changed the title Symex-dereference: simplify after deref Symex-dereference: simplify after deref [depends-on: #2068] Feb 8, 2019
@tautschnig
Copy link
Collaborator Author

This requires the byte_update fixes from #2068 to make the SMT solver happy (adding the simplify step changes a nested byte_extract into a byte_update).

@tautschnig tautschnig changed the title Symex-dereference: simplify after deref [depends-on: #2068] Symex-dereference: simplify after deref [depends-on: #2068, #3770] Feb 18, 2019
// when all we need is
// s1=s1 with member:=X [and guard b]
// s2=s2 with member:=X [and guard !b]
do_simplify(expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to see what the impact on performance this change has.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will benchmark (and I hope @romainbrenguier can do his benchmarking as well) once the dependencies #2068 and #3770 are merged so that we have a precise analysis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could maybe be made faster with a simplify method which would focus on the conversions that are mentioned above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly one option - are there particular patterns that affect Java? (@smowton maybe?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: this appears to be causing performance degradation; we may need to look into a specialised simplifier to achieve this.

// make sure simplify has not re-introduced any dereferencing that
// had previously been cleaned away
INVARIANT(
!has_subexpr(expr, ID_dereference), "simplify re-introduced dereferencing");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite heavy to run on each call to dereference. Do we yet have a way to avoid evaluating invariants (and other assertions) in deployed binaries of CBMC?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I have now wrapped it in if(symex_config.run_validation_checks).

// applying field sensitivity, which can turn such field-of-symbol expressions
// into atomic SSA expressions, but would have to rewrite all of 'o1'
// otherwise.
// make the structure of the lhs as simple as possible to avoid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems redundant

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment cleaned up.

@allredj
Copy link
Contributor

allredj commented Feb 28, 2019

Thanks for running those benchmarks. Is there any evidence that this helps for Java?

@smowton
Copy link
Contributor

smowton commented Mar 1, 2019

Summary here: this isn't expected to cause a performance increase by itself (and indeed will definitely cost a little more). It is primarily a pre-requisite for field-sensitive symex, because of the member-of-if -> if-of-member transformation mentioned in the comments added by this PR.

I don't see a clear way to restrict simplify's activities except to keep track of expressions which are already maximally simplified and suppress repeated simplification. I've seen a couple of drafts by @tautschnig which attempt this, either by the simplifier tracking ireps it has seen and simplified, or by attaching comments to simplified ireps, or by adding an extra boolean to irept to track is-simplified state.

@romainbrenguier
Copy link
Contributor

Summary here: this isn't expected to cause a performance increase by itself (and indeed will definitely cost a little more). It is primarily a pre-requisite for field-sensitive symex, because of the member-of-if -> if-of-member transformation mentioned in the comments added by this PR.

I guess that's fine, but in that case it could have been better for it to be part of the field-sensitive PR.

@tautschnig
Copy link
Collaborator Author

I've seen a couple of drafts by @tautschnig which attempt this, either by the simplifier tracking ireps it has seen and simplified, or by attaching comments to simplified ireps, or by adding an extra boolean to irept to track is-simplified state.

FWIW, tautschnig@ef00e0a is the most serious of those. It does seem to work well, but @kroening rightfully remarked that it opens the door to everyone asking for their little pet flag. But we'll get there when the other performance-optimising PRs have been reviewed/merged so that I can start posting those changes as PRs. Yet, as @smowton said, it's one more reason I'm not at all worried about performance cost here.

I have no issue merging this as part of #2574 as @romainbrenguier suggested, but I think it was important to debug and evaluate it separately as this has lead to a lot of type-related bug fixes.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 9abfc97).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/102907891

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: f1ce452).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/103024631
Status will be re-evaluated on next push.
Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)

  • the author is not in the list of contributors (e.g. first-time contributors).

  • the compatibility was already broken by an earlier merge.

@tautschnig
Copy link
Collaborator Author

@smowton will add some benchmarking data to confirm that this is necessary for #4314 to go ahead.

@tautschnig tautschnig force-pushed the simplify-after-deref branch from f1ce452 to 427df5c Compare March 4, 2019 14:13
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 427df5c).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/103062461
Status will be re-evaluated on next push.
Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)

  • the author is not in the list of contributors (e.g. first-time contributors).

  • the compatibility was already broken by an earlier merge.

@smowton
Copy link
Contributor

smowton commented Mar 6, 2019

I benchmarked 133 Tika methods and found total runtime = 4818s with this PR + #4314, 4860s with just #4314, 5974s with just develop. Most of that reduction comes from 3 tests that were previously timeouts (timeout = 300s) that now complete in 30 seconds; the rest comes from 10-15s reductions in many tests that previously took 10-20s to analyse.

I expected to find a major penalty for not including this PR alongside #4314, because #4314 causes value_set_dereferencet to more rigorously use byte_extract operations when generic types differ, but no such effect was observed. This may be because constant-propagation was rarely succeeding in these cases and without it the byte-extract op is not actually that expensive, or it may be because the associated cost is only exposed when field-sensitivity is enabled, which by failing to promote member-of-symbol to an atomic SSA identifier forces a more-expensive composite struct operation.

@smowton
Copy link
Contributor

smowton commented Mar 6, 2019

4314 + this + develop vs. develop:
perf_out

@smowton
Copy link
Contributor

smowton commented Mar 6, 2019

Successfully reproduced a micro-benchmark exposing the kind of problem I was expecting:

Test program:

public class Test {

  public static void main() {

    Generic<Integer> g = new GenericSub<Integer>();

    int x = 0;
    for(int i = 0; i < 3000; ++i)
      x += g.get();

    assert x == 0;

  }

}

class Generic<T> {

  T key; // Only here to make sure the compiler doesn't throw <T> away
  int x;

  public int get() { return 0; }

  public Generic() {
    key = null;
    x = 5;
  }

}

class GenericSub<S> extends Generic<S> {

  public int get() { return x; }

}

Times for --show-vcc:

Develop: 1.3s
#4314: 9.4s
#4314 + #3725: 1.5s

@smowton
Copy link
Contributor

smowton commented Mar 6, 2019

Excerpt from --show-vcc illustrating the problem:

-{-14} symex_dynamic::dynamic_object1#3 = { { { "java::GenericSub" }, NULL, 0 } }
-{-15} symex_dynamic::dynamic_object1#4 = { { { "java::GenericSub" }, NULL, 5 } }
+{-14} symex_dynamic::dynamic_object1#3 = byte_extract_little_endian(with(byte_extract_little_endian({ { "java::GenericSub" }, NULL, 0 }, 0, struct java::Generic), member_name component_name="key", NULL), 0, struct java::GenericSub)
+{-15} symex_dynamic::dynamic_object1#4 = byte_extract_little_endian(with(byte_extract_little_endian(symex_dynamic::dynamic_object1#3.@Generic, 0, struct java::Generic), member_name component_name="x", 5), 0, struct java::GenericSub)

@smowton
Copy link
Contributor

smowton commented Mar 6, 2019

VCCs for develop and #4314 + #3725 are identical; those for #4314 alone are ~25000 lines longer due to constant prop failure

smowton and others added 2 commits March 6, 2019 22:57
This is generally useful because it turns member-of-if into if-of-member, which
avoids repeatedly quoting large complex if expressions in a with_exprt, but it
is especially good for field sensitivity as it puts member expressions directly
on top of symbol expressions, which field sensitivity knows how to deal with.
This was fine with develop with and without simplification after dereferencing,
but would break if removing base_type_eq without adding the simplification step.
@tautschnig tautschnig force-pushed the simplify-after-deref branch from 427df5c to 0b09ffa Compare March 6, 2019 23:09
@tautschnig
Copy link
Collaborator Author

Thanks a lot @smowton for the detailed analysis! I have added the test that you provided as a second commit.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 0b09ffa).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/103459224

@tautschnig tautschnig merged commit 98103ff into diffblue:develop Mar 7, 2019
@tautschnig tautschnig deleted the simplify-after-deref branch March 7, 2019 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants