Skip to content

[TG-8996] Avoid redundant expressions when dereferencing #4977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Aug 2, 2019

Our dereferencing code currently special-cases nested dereferences, allowing us to turn
**x directly into something like (x == &o1 ? 1 : 2) in the best case where 'x' has two
possible aliases, rather than going via an intermediary like *(x == &o1 ? o1 : o2)
==> ((x == &o1 ? o1 : o2) == &o3 ? 1 : 2).

However, this can lead to redundnacy when o1 and o2 have overlapping alias sets, such as
(x == &o1 ? (o1 == &o3 ? o3 : o4) : (o2 == &o3 ? o3 : o4))

In this case it is better to use a temporary:

derefd_pointer = x == &o1 ? o1 : o2
result = derefd_pointer == &o3 ? o3 : o4

This restructuring of value_set_dereferencet::dereference attempts to keep the best of both
worlds by dividing the algorithm into two stages: one which determines the alias sets that
would be created, using placeholder symbols to mark where they occur in the overall expression,
followed by a commit stage which decides whether to commit the tentative result or to combine
the alias sets and use an intermediate temporary symbol instead.

The result should be that when alias sets overlap we use an intermediate variable, but when they
are disjoint we apply the double-dereference in-place.

Our dereferencing code currently special-cases nested dereferences, allowing us to turn
**x directly into something like (x == &o1 ? 1 : 2) in the best case where 'x' has two
possible aliases, rather than going via an intermediary like '*(x == &o1 ? o1 : o2)'
==> '((x == &o1 ? o1 : o2) == &o3 ? 1 : 2)'.

However, this can lead to redundnacy when o1 and o2 have overlapping alias sets, such as
(x == &o1 ? (o1 == &o3 ? o3 : o4) : (o2 == &o3 ? o3 : o4))

In this case it is better to use a temporary:
derefd_pointer = x == &o1 ? o1 : o2
result = derefd_pointer == &o3 ? o3 : o4

This restructuring of value_set_dereferencet::dereference attempts to keep the best of both
worlds by dividing the algorithm into two stages: one which determines the alias sets that
would be created, using placeholder symbols to mark where they occur in the overall expression,
followed by a commit stage which decides whether to commit the tentative result or to combine
the alias sets and use an intermediate temporary symbol instead.

The result should be that when alias sets overlap we use an intermediate variable, but when they
are disjoint we apply the double-dereference in-place.
@smowton smowton force-pushed the smowton/feature/avoid-redundant-derefs branch from 970421c to 83575f5 Compare August 2, 2019 13:23
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 970421c).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/121755751
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@codecov-io
Copy link

codecov-io commented Aug 2, 2019

Codecov Report

Merging #4977 into develop will increase coverage by 0.01%.
The diff coverage is 97.82%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4977      +/-   ##
===========================================
+ Coverage    69.24%   69.26%   +0.01%     
===========================================
  Files         1309     1309              
  Lines       108453   108512      +59     
===========================================
+ Hits         75096    75157      +61     
+ Misses       33357    33355       -2
Impacted Files Coverage Δ
src/pointer-analysis/value_set_dereference.h 100% <100%> (ø) ⬆️
src/pointer-analysis/value_set_dereference.cpp 95.27% <97.77%> (+0.96%) ⬆️
src/goto-symex/symex_dereference_state.cpp 73.07% <0%> (+11.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf7b9bf...83575f5. Read the comment docs.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 83575f5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/121761390

@@ -1,7 +1,7 @@
CORE
double_deref_with_pointer_arithmetic_single_alias.c
--show-vcc
\{1\} \(derefd_pointer::derefd_pointer!0#1 = address_of\(symex_dynamic::dynamic_object2\) \? main::argc!0@1#1 = 1 : symex::invalid_object!0#0 = main::argc!0@1#1\)
\{1\} \(symex_dynamic::dynamic_object1#2\[cast\(mod #source_location=""\(main::argc!0@1#1, 2\), signedbv\[64\]\)\] = address_of\(symex_dynamic::dynamic_object2\) \? main::argc!0@1#1 = 1 : symex::invalid_object!0#0 = main::argc!0@1#1\)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 64 will only apply to certain architectures. I'd suggest using \d+ instead.

@allredj
Copy link
Contributor

allredj commented Aug 6, 2019

Waiting for performance benchmark on this.

@smowton
Copy link
Contributor Author

smowton commented Aug 8, 2019

SVCOMP/C benchmark results: on my 2.5-hour subsample, negligible difference

@allredj
Copy link
Contributor

allredj commented Aug 8, 2019

@smowton Is it worth all that boilerplate then?

@smowton
Copy link
Contributor Author

smowton commented Aug 8, 2019

@tautschnig can you confirm it improves things for your problematic example? Perhaps the case where it helps just isn't exercised in SVCOMP / takes longer than the timeout?

@smowton
Copy link
Contributor Author

smowton commented Aug 8, 2019

So encouragingly that suggests this is generally neutral at least; I'll also have a go at recreating pathological cases where this is helpful

@smowton smowton changed the title Avoid redundant expressions when dereferencing [TG-8996] Avoid redundant expressions when dereferencing Aug 12, 2019
@smowton
Copy link
Contributor Author

smowton commented Aug 14, 2019

If people end up unhappy with this btw then one possibility is to never use the recursive-dereference case, i.e. when an expression contains multiple nested deref operations always introduce a temporary. That would be more verbose, e.g. derefd_pointer = x == &o1 ? o1 : o2; y = 5 in the optimal case where *o1 and *o2 both resolve to the same constant, or y = derefd_pointer == &o3 ? 1 : 2 in the case where its referees are constant but distinct (cf. the cleaner expression y = x == &o1 ? 1 : 2).

This would mean simpler code, somewhat more verbose VCCs, and the same avoidance of redundant nested derefs as we get with this PR.

@allredj
Copy link
Contributor

allredj commented Aug 14, 2019

I would advocate for cleaner code (which we care about) over smaller VCCs (which we don't really care about).

@peterschrammel
Copy link
Member

one possibility is to never use the recursive-dereference case

If this doesn't degrade performance then this seems preferable to me - both, in terms of simpler implementation and more readable VCCs (I wouldn't say nested conditionals are easy to read).

@peterschrammel peterschrammel removed their assignment Nov 18, 2020
@tautschnig
Copy link
Collaborator

@smowton Would you be able to find time to look into this one as well?

@smowton
Copy link
Contributor Author

smowton commented Dec 26, 2020

@tautschnig looked at this one, but the rebase is difficult and people weren't happy with the implementation anyhow, so I'd encourage someone with more time to work on CBMC to take another pass at this starting with the basic idea:

In a double-deref situation where the points-to graph has at least one convergence (I recall you sent me a pathalogical case where the next pointer of a circular list is followed twice and we weren't certain where in the list we were starting, so each follow may access any list cell), then we might generate a redundant expression like x == &o1 ? (y == &o3 ? result3 : result4) : (y == &o3 ? result3 : result4).

It would be better to generate:

tmp = x == &o1 ? &o3 : &o4
tmp == &o3 ? result3 : result4

This is particularly bad when the deref occurs on the LHS (e.g. node->next->prev = node), because this can lead to many more conditional-updates than are really necessary.

Using some heuristic or other, I recommend noticing when value_set_dereferencet is about to / has already generated either an expression containing such redundancy or just any expression above some size and introducing temporaries for the first deref instead of pushing the second deref in.

As a first step, you should dig up your circular-list pathological case that stimulated this work to begin with.

@tautschnig
Copy link
Collaborator

Closing as #5494 provides a different implementation towards the same goal.

@tautschnig tautschnig closed this Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants