-
Notifications
You must be signed in to change notification settings - Fork 274
Performance improvement and cleanup of local bitvector analysis #6528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
08d2339
to
c25f97b
Compare
Commit message: "enqueuing successor nodes" -> "dequeuing successor nodes" |
By using a set instead of a stack as work queue we reduce the number of visits to each node in two ways: 1) The work queue will not contain duplicate nodes. 2) When dequeuing successor nodes, back edges will be given preference over forward ones. Once the later (in program order) node is visited, more information is already merged into the abstract state. On a goto function with 160822 instruction, this reduces processing time (just the local bitvector analysis) from 1700 seconds down to 72 seconds. The number of invocations of `local_bitvector_analysist::merge` previously was 32815499 (204 times the number of nodes), and is now 997083 (6 times the number of nodes).
This avoids a jump. On a goto function of 160822 instructions, the total time required by local bitvector analysis decreases from 72 seconds to 65 seconds (averaged over three runs, the difference between the slowest and longest run is below 0.2 seconds).
The correct type is std::size_t in all cases. Use `auto` where the type can be inferred, and the typedef'd name where it's `local_cfgt`'s business to choose the type.
c25f97b
to
8834dc5
Compare
Err, of course, thank you! |
Codecov Report
@@ Coverage Diff @@
## develop #6528 +/- ##
========================================
Coverage 75.98% 75.98%
========================================
Files 1578 1578
Lines 180910 180919 +9
========================================
+ Hits 137467 137476 +9
Misses 43443 43443
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20x speed-up due to a change in data-structure! I love it! Classic computer science. Would it be possible to have another PR that s/stack
/set
/ in local_may_aliast
too?
@@ -242,8 +242,8 @@ void local_bitvector_analysist::build() | |||
if(cfg.nodes.empty()) | |||
return; | |||
|
|||
work_queuet work_queue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has a friend who would also like to become a set:
cbmc/src/analyses/local_may_alias.h
Line 61 in eeddb3f
typedef std::stack<local_cfgt::node_nrt> work_queuet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll do so in a follow-up PR.
if(a[i].merge(b[i])) | ||
result=true; | ||
} | ||
result |= a[i].merge(b[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment that the bitwise or is used instead of the logical or to prevent the short-circuit semantics.
work_queuet work_queue; | ||
work_queue.push(0); | ||
std::set<local_cfgt::node_nrt> work_queue; | ||
work_queue.insert(0); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Queues are usually ordered (especially in the UK). Is the set really helping performance, and if so, I'd rename it to "work_set".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this result in a different fixed point being reached ? What are the implications for the precision of the analysis ? Are there any iteration strategies that can result in more precise results ? In any acceptable perf tradeoff that favors precision might more useful for users of this class.
Can this result in a different fixed point being reached ?
In the general case, yes. In this specific analysis, I think not
because it is so simple and does not do any of the obvious "exploration
order matters" things like interprocedural analysis.
What are the implications for the precision of the analysis ?
I don't think there are any. I would be interested in being proven
wrong. It should be possible to easily test for significant
differences by:
1. Patching the "remove from set" to remove a random element.
2. Ask @tautschnig to run the analysis a bunch of times with different
random seeds and see what happens.
Are there any iteration strategies that can result in more precise
results ?
Given a program, there is a best order. It is not clear (and I think
somewhere I have examples to show) that there is not one order which
works for all programs.
In any
acceptable perf tradeoff that favors precision might more useful for
users of this class.
I think in this specific case it doesn't matter, plus, this is supposed
to be a very light-weight analysis.
If you want to experiment, `ai_historyt` has control over the order of
exploration, via:
https://github.com/diffblue/cbmc/blob/47c1c7e70cb17e3315ebacde99ef5b1f4a69eb91/src/analyses/ai_history.h#L123
Some of the existing implementation are ... to put it bluntly, kinda
crude:
https://github.com/diffblue/cbmc/blob/47c1c7e70cb17e3315ebacde99ef5b1f4a69eb91/src/analyses/ai_history.h#L199
So this could be improved.
If you want to do something more sophisticated then a new `ai_historyt`
or inheriting from `ai_baset` and changing how `fixpoint()` works:
https://github.com/diffblue/cbmc/blob/47c1c7e70cb17e3315ebacde99ef5b1f4a69eb91/src/analyses/ai.cpp#L224
HTH
Cheers,
- Martin
|
Please review commit-by-commit. The commit messages include performance evaluation data.