-
Notifications
You must be signed in to change notification settings - Fork 273
[TG-8996] Avoid redundant expressions when dereferencing #4977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TG-8996] Avoid redundant expressions when dereferencing #4977
Conversation
Our dereferencing code currently special-cases nested dereferences, allowing us to turn **x directly into something like (x == &o1 ? 1 : 2) in the best case where 'x' has two possible aliases, rather than going via an intermediary like '*(x == &o1 ? o1 : o2)' ==> '((x == &o1 ? o1 : o2) == &o3 ? 1 : 2)'. However, this can lead to redundnacy when o1 and o2 have overlapping alias sets, such as (x == &o1 ? (o1 == &o3 ? o3 : o4) : (o2 == &o3 ? o3 : o4)) In this case it is better to use a temporary: derefd_pointer = x == &o1 ? o1 : o2 result = derefd_pointer == &o3 ? o3 : o4 This restructuring of value_set_dereferencet::dereference attempts to keep the best of both worlds by dividing the algorithm into two stages: one which determines the alias sets that would be created, using placeholder symbols to mark where they occur in the overall expression, followed by a commit stage which decides whether to commit the tentative result or to combine the alias sets and use an intermediate temporary symbol instead. The result should be that when alias sets overlap we use an intermediate variable, but when they are disjoint we apply the double-dereference in-place.
970421c
to
83575f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: 970421c).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/121755751
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
Codecov Report
@@ Coverage Diff @@
## develop #4977 +/- ##
===========================================
+ Coverage 69.24% 69.26% +0.01%
===========================================
Files 1309 1309
Lines 108453 108512 +59
===========================================
+ Hits 75096 75157 +61
+ Misses 33357 33355 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 83575f5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/121761390
@@ -1,7 +1,7 @@ | |||
CORE | |||
double_deref_with_pointer_arithmetic_single_alias.c | |||
--show-vcc | |||
\{1\} \(derefd_pointer::derefd_pointer!0#1 = address_of\(symex_dynamic::dynamic_object2\) \? main::argc!0@1#1 = 1 : symex::invalid_object!0#0 = main::argc!0@1#1\) | |||
\{1\} \(symex_dynamic::dynamic_object1#2\[cast\(mod #source_location=""\(main::argc!0@1#1, 2\), signedbv\[64\]\)\] = address_of\(symex_dynamic::dynamic_object2\) \? main::argc!0@1#1 = 1 : symex::invalid_object!0#0 = main::argc!0@1#1\) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 64
will only apply to certain architectures. I'd suggest using \d+
instead.
Waiting for performance benchmark on this. |
SVCOMP/C benchmark results: on my 2.5-hour subsample, negligible difference |
@smowton Is it worth all that boilerplate then? |
@tautschnig can you confirm it improves things for your problematic example? Perhaps the case where it helps just isn't exercised in SVCOMP / takes longer than the timeout? |
So encouragingly that suggests this is generally neutral at least; I'll also have a go at recreating pathological cases where this is helpful |
If people end up unhappy with this btw then one possibility is to never use the recursive-dereference case, i.e. when an expression contains multiple nested deref operations always introduce a temporary. That would be more verbose, e.g. This would mean simpler code, somewhat more verbose VCCs, and the same avoidance of redundant nested derefs as we get with this PR. |
I would advocate for cleaner code (which we care about) over smaller VCCs (which we don't really care about). |
If this doesn't degrade performance then this seems preferable to me - both, in terms of simpler implementation and more readable VCCs (I wouldn't say nested conditionals are easy to read). |
@smowton Would you be able to find time to look into this one as well? |
@tautschnig looked at this one, but the rebase is difficult and people weren't happy with the implementation anyhow, so I'd encourage someone with more time to work on CBMC to take another pass at this starting with the basic idea: In a double-deref situation where the points-to graph has at least one convergence (I recall you sent me a pathalogical case where the It would be better to generate:
This is particularly bad when the deref occurs on the LHS (e.g. Using some heuristic or other, I recommend noticing when value_set_dereferencet is about to / has already generated either an expression containing such redundancy or just any expression above some size and introducing temporaries for the first deref instead of pushing the second deref in. As a first step, you should dig up your circular-list pathological case that stimulated this work to begin with. |
Closing as #5494 provides a different implementation towards the same goal. |
Our dereferencing code currently special-cases nested dereferences, allowing us to turn
**x
directly into something like(x == &o1 ? 1 : 2)
in the best case where 'x' has twopossible aliases, rather than going via an intermediary like
*(x == &o1 ? o1 : o2)
==>
((x == &o1 ? o1 : o2) == &o3 ? 1 : 2)
.However, this can lead to redundnacy when o1 and o2 have overlapping alias sets, such as
(x == &o1 ? (o1 == &o3 ? o3 : o4) : (o2 == &o3 ? o3 : o4))
In this case it is better to use a temporary:
This restructuring of
value_set_dereferencet::dereference
attempts to keep the best of bothworlds by dividing the algorithm into two stages: one which determines the alias sets that
would be created, using placeholder symbols to mark where they occur in the overall expression,
followed by a commit stage which decides whether to commit the tentative result or to combine
the alias sets and use an intermediate temporary symbol instead.
The result should be that when alias sets overlap we use an intermediate variable, but when they
are disjoint we apply the double-dereference in-place.