-
Notifications
You must be signed in to change notification settings - Fork 274
merge_irept: insert once and then const_cast to edit in place #4155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/util/merge_irep.cpp
Outdated
irep_storet::const_iterator entry=irep_store.find(irep); | ||
if(entry!=irep_store.end()) | ||
return *entry; | ||
|
||
bool same_irep = true; | ||
irept new_irep(irep.id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about starting with new_irep = irep
(cheap refcounted copy), then selectively overwriting sub
/ named_sub
? That way we only make a CoW break if merged
does some merging and we actually update either sub
or named_sub
, whereas at present we'll build and then discard new_irep
in that case.
Alternatively if you take my suggestion above then new_irep
will just be irept &new_irep = *irep_store.insert(irep).first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a follow-up change that will avoid discarding new_irep
(well, in fact it will never build new_irep
in the first place), but I wanted to benchmark these independently (as they optimise on different aspects, this here reduces the number of operator==
calls, while the other will reduce the number of detach
calls). Let's look into this on that other PR (yet to come).
src/util/merge_irep.cpp
Outdated
@@ -145,18 +145,26 @@ void merge_irept::operator()(irept &irep) | |||
|
|||
const irept &merge_irept::merged(const irept &irep) | |||
{ | |||
if(irep.get_sub().empty() && irep.get_named_sub().empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we start with an unconditional insert
regardless of get_sub()
or get_named_sub()
, then alter it in place if any of the child merges do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume irept
A was previously inserted into the store. Now we have an irept
B. If the unconditional insert of B is unsuccessful (because A==B), then each of the child merges would result in a detach
, meaning that sharing between A and its stored copy is now broken. At the end, B would now be the shared copy of the store, and so it would continue every time you call this method. We would actually not have any sharing at all anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? If the insert is unsuccessful then only read-only operations are performed (operator==
and hash
), after which a reference to A is returned. You then return that, like the find
call currently immediately below the new insert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I didn't get that you still wanted to return immediately. Let me try to see how that would perform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constness of the inserted element gets in the way I'm afraid: the store is a std::unordered_set
, so elements once inserted cannot be modified in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd say this is an ideal case for use of const_cast
, since we're making changes that don't affect its ==
-ness or its hash code. Also it's surely worth it to avoid the find
then insert
that we get for ireps not currently in the store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got an implementation that does what you proposed, but this now highlights how broken our use of parametert
is: sometimes two parametert
with different ID_C_identifier
should be considered distinct (when using them to assign to arguments), while otherwise we would want them to be the same (while type checking declarations vs definitions). I'll see what I can do...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the parametert
business should still be cleaned up, it was actually me being stupid. I should have benchmarking results in a few hours.
Also is the description accurate? It looks like the number of stores will be the same, but we'll save a |
Re description: indeed, it should say |
de358fe
to
d394c49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫
This PR failed Diffblue compatibility checks (cbmc commit: de358fe).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100454419
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.
Common spurious failures:
- the cbmc commit has disappeared in the mean time (e.g. in a force-push)
- the author is not in the list of contributors (e.g. first-time contributors).
The incompatibility may have been introduced by an earlier PR. In that case merging this
PR should be avoided unless it fixes the current incompatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: d394c49).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100473986
6dbc155
to
fd7c02d
Compare
This avoids multiple lookups in the hash set, each of which requires a number of operator== and hash() calls. The use of const_cast is acceptable, because we do not actually change the object in a way that would affect ordering (objects will continue to have the same hash value and still compare equal). We furthermore ensure that operator== can use pointer equality when submitting an already-shared irept to merge_irept. On SV-COMP/ReachSafety-ECA this reduces the number of recursive operator== invocations from 5,814,764,546 (generated by 4,241,423,614 invocations) to 2,232,563,047 (generated from 4,326,016,698 invocations). That is, despite ~100M additional invocations (because the performance improvement permits more symbolic execution runs) around 3.6B recursive calls are avoided.
fd7c02d
to
3b95d0a
Compare
@smowton Done, finally ready for re-review. The results are quite a bit better! |
Update the description with them? |
That was done already - my initial attempt saved 1.1B calls, now we're saving 3.6B calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 3b95d0a).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/100638750
This avoids multiple lookups in the hash set, each of which requires a number of
operator== and hash() calls. The use of const_cast is acceptable, because we do
not actually change the object in a way that would affect ordering (objects will
continue to have the same hash value and still compare equal).
We furthermore ensure that operator== can use pointer equality when submitting
an already-shared irept to merge_irept. On SV-COMP/ReachSafety-ECA this reduces
the number of recursive operator== invocations from 5,814,764,546 (generated by
4,241,423,614 invocations) to 2,232,563,047 (generated from 4,326,016,698
invocations). That is, despite ~100M additional invocations (because the
performance improvement permits more symbolic execution runs) around 3.6B
recursive calls are avoided.