Skip to content

merge_irept: insert once and then const_cast to edit in place #4155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2019

Conversation

tautschnig
Copy link
Collaborator

@tautschnig tautschnig commented Feb 11, 2019

This avoids multiple lookups in the hash set, each of which requires a number of
operator== and hash() calls. The use of const_cast is acceptable, because we do
not actually change the object in a way that would affect ordering (objects will
continue to have the same hash value and still compare equal).

We furthermore ensure that operator== can use pointer equality when submitting
an already-shared irept to merge_irept. On SV-COMP/ReachSafety-ECA this reduces
the number of recursive operator== invocations from 5,814,764,546 (generated by
4,241,423,614 invocations) to 2,232,563,047 (generated from 4,326,016,698
invocations). That is, despite ~100M additional invocations (because the
performance improvement permits more symbolic execution runs) around 3.6B
recursive calls are avoided.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

irep_storet::const_iterator entry=irep_store.find(irep);
if(entry!=irep_store.end())
return *entry;

bool same_irep = true;
irept new_irep(irep.id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about starting with new_irep = irep (cheap refcounted copy), then selectively overwriting sub / named_sub? That way we only make a CoW break if merged does some merging and we actually update either sub or named_sub, whereas at present we'll build and then discard new_irep in that case.

Alternatively if you take my suggestion above then new_irep will just be irept &new_irep = *irep_store.insert(irep).first

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a follow-up change that will avoid discarding new_irep (well, in fact it will never build new_irep in the first place), but I wanted to benchmark these independently (as they optimise on different aspects, this here reduces the number of operator== calls, while the other will reduce the number of detach calls). Let's look into this on that other PR (yet to come).

@@ -145,18 +145,26 @@ void merge_irept::operator()(irept &irep)

const irept &merge_irept::merged(const irept &irep)
{
if(irep.get_sub().empty() && irep.get_named_sub().empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we start with an unconditional insert regardless of get_sub() or get_named_sub(), then alter it in place if any of the child merges do anything?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume irept A was previously inserted into the store. Now we have an irept B. If the unconditional insert of B is unsuccessful (because A==B), then each of the child merges would result in a detach, meaning that sharing between A and its stored copy is now broken. At the end, B would now be the shared copy of the store, and so it would continue every time you call this method. We would actually not have any sharing at all anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? If the insert is unsuccessful then only read-only operations are performed (operator== and hash), after which a reference to A is returned. You then return that, like the find call currently immediately below the new insert

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I didn't get that you still wanted to return immediately. Let me try to see how that would perform.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constness of the inserted element gets in the way I'm afraid: the store is a std::unordered_set, so elements once inserted cannot be modified in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd say this is an ideal case for use of const_cast, since we're making changes that don't affect its ==-ness or its hash code. Also it's surely worth it to avoid the find then insert that we get for ireps not currently in the store?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got an implementation that does what you proposed, but this now highlights how broken our use of parametert is: sometimes two parametert with different ID_C_identifier should be considered distinct (when using them to assign to arguments), while otherwise we would want them to be the same (while type checking declarations vs definitions). I'll see what I can do...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the parametert business should still be cleaned up, it was actually me being stupid. I should have benchmarking results in a few hours.

@smowton
Copy link
Contributor

smowton commented Feb 11, 2019

Also is the description accurate? It looks like the number of stores will be the same, but we'll save a find in the case where the irep has no sub or named-sub?

@tautschnig
Copy link
Collaborator Author

Re description: indeed, it should say irept::dt instead of just irept.

@tautschnig tautschnig force-pushed the merge-irept-skip-same branch from de358fe to d394c49 Compare February 11, 2019 14:42
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫
This PR failed Diffblue compatibility checks (cbmc commit: de358fe).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100454419
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

The incompatibility may have been introduced by an earlier PR. In that case merging this
PR should be avoided unless it fixes the current incompatibility.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: d394c49).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100473986

@tautschnig tautschnig force-pushed the merge-irept-skip-same branch 2 times, most recently from 6dbc155 to fd7c02d Compare February 12, 2019 12:52
This avoids multiple lookups in the hash set, each of which requires a number of
operator== and hash() calls. The use of const_cast is acceptable, because we do
not actually change the object in a way that would affect ordering (objects will
continue to have the same hash value and still compare equal).

We furthermore ensure that operator== can use pointer equality when submitting
an already-shared irept to merge_irept. On SV-COMP/ReachSafety-ECA this reduces
the number of recursive operator== invocations from 5,814,764,546 (generated by
4,241,423,614 invocations) to 2,232,563,047 (generated from 4,326,016,698
invocations). That is, despite ~100M additional invocations (because the
performance improvement permits more symbolic execution runs) around 3.6B
recursive calls are avoided.
@tautschnig tautschnig force-pushed the merge-irept-skip-same branch from fd7c02d to 3b95d0a Compare February 12, 2019 12:53
@tautschnig tautschnig changed the title merge_irept: do not store fresh irept unless necessary merge_irept: insert once and then const_cast to edit in place Feb 12, 2019
@tautschnig
Copy link
Collaborator Author

@smowton Done, finally ready for re-review. The results are quite a bit better!

@smowton
Copy link
Contributor

smowton commented Feb 12, 2019

Update the description with them?

@tautschnig
Copy link
Collaborator Author

Update the description with them?

That was done already - my initial attempt saved 1.1B calls, now we're saving 3.6B calls.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :D

@tautschnig tautschnig self-assigned this Feb 12, 2019
@tautschnig tautschnig merged commit 2f1514e into diffblue:develop Feb 12, 2019
@tautschnig tautschnig deleted the merge-irept-skip-same branch February 12, 2019 16:37
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 3b95d0a).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100638750

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants