-
Notifications
You must be signed in to change notification settings - Fork 273
Use more efficient algorithm to calculate dominators #1787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use more efficient algorithm to calculate dominators #1787
Conversation
Would it be possible to split this into two commits (still all in the one PR), with the first commit being the change to Lengauer&Tarjan's algorithm, and then the second commit being the changes around the use of std::set ? That might make it a little clearer to review the changes? |
two regression tests fail due to core dumps
caused by |
out of interest, what is the performance improvement ? |
out of interest, what is the performance improvement ?
And the benchmarks? My guess is that it is substantial for larger programs.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to dig in my emails as some years ago I had changed the implementation according to some paper, but completely failed to document this in the source (and git log
doesn't seem to be helpful either). Using dominator trees is definitively a major step forward, though.
src/analyses/cfg_dominators.h
Outdated
enum class Constness | ||
{ | ||
Constant, | ||
NotConstant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No CaMeLCase, please.
src/analyses/cfg_dominators.h
Outdated
bool changed=false; | ||
typename cfgt::nodet &node=cfg[cfg.entry_map[current]]; | ||
if(node.dominators.empty()) | ||
void dfs(node_indext root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not depth-first search on a grapht
? If it isn't, please add comments. If it is, please move the implementation to util/graph.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is dfs on a grapht
, yes. How would you like it moved
to graph.h? Like
template<typename Graph, typename Function> void depth_first_search(const Graph& graph, typename graph::index_t start_node, Function function);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't it become a member of grapht
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, what I meant was if you wanted it to take a "per node work" function as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your suggestion is good, except I’d recommend making it a member function.
unit/analyses/cfg_dominators.cpp
Outdated
REQUIRE(log_doms.find(dominators_data.data[4]) != log_doms.end()); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing end-of-line.
So here's what I found in my emails: the present implementation follows Cooper et al. "A Simple, Fast Dominance Algorithm", which claims to be faster in practice than Lengauer&Tarjan (even though worst-case complexity is incomparable). The changes to follow Cooper et al. were done in e6f36d9. The move to Cooper et al. was motivated by poor performance observed while running my Debian experiments. I'd thus echo the request to provide details on benchmarking that has been done. |
@mgudemann W.r.t the regression test failure, unfortunately I have a bit of a case of "doesn't happen on my machine here". This error is raised by my set iterators when iterators created from different sets are compared against each other; If this error is raised either my logic when creating these iterators is screwy or somewhere in the code iterators are used in an unsound manner. I'll investigate. |
@tautschnig As to the Cooper algorithm: The current version is not a correct implementation of it: they use a special data structure to perform fast set intersections using some knowledge about how the graph is traversed, while we do not. This data structure is the main reason the algorithm runs fast on small graphs. Second, the problem this PR is addressing (or at least trying to start to address) is the performance with large graphs, which Cooper suggests will be worse than Tarjan; Cooper suggests their algorithm runs better on practical graphs, where they consider "practical" to mean with < 1000 nodes. The graphs we were having troubles with are significantly larger than that. |
@tautschnig @martin-cs @mgudemann Regarding benchmarks, I've created some artificial test cases looking like this:
Where the middle line is repeated (with different numbers) N times. I've tested with N=100, 1000 and 10000 (called small, medium and large respectively). I've run
As you can see, with large graphs the difference is quite substantial (overall 3 times faster in the v2/large case). I've also used callgrind to get some more detailed information, which is where I noticed that in the tarjan case we spend nearly all the time in the last step building the sets, which is how I got the idea to use lazy sets (my tarjan without lazy sets version is almost identical, but uses normal std::sets in assign_dominators). Here are the results of running the v2/large set with callgrind. Notice that the current version spends nearly 70% of the time in the fixedpoint step, with the tarjan+lazy set version it's only 0.1% even though the overall running time is significantly lower. This performance gained at that point is partially paid back in parts of the code using the dominators because of the linear lookup time, but the differential in my test cases implies it's still worth doing. Of course there's a risk that these test files happen to be pathological cases, @martin-cs suggested I try to run these against small/medium open source projects which I'm going to do tomorrow. Please let me know if there's any other statistic you're interested in. |
For benchmarking, would you be able to play with packages from http://debian-fortification.net/? The files in the various goto-binaries.tar.bz2 should do. |
As the current interface at http://debian-fortification.net/ is very cumbersome I am attaching a list of URLs of >9000 goto-binary archives: all-binaries-latest.txt |
@tautschnig Thanks, I'm looking into one of those right now. |
@tautschnig For general context it might be useful to know the original
purpose of this change was to handle very large goto-programs. For
example up to 50,000 instructions. These occur when you inline large
sections of programs (long story; can explain offline). Thus we need to
think not only about the practical speed for the usual use cases but
also about the asymptotic behaviour.
|
@martin-cs Understood, and you’ll find code of this kind in the collection of binaries/Debian packages - it’s really been 10,000+ instruction functions that prompted me to first debug the scalability failures and then implement Cooper et al. |
@tautschnig Do you remember one or two in particular that were causing problems? |
I'm afraid I don't, and I don't think I would have any logs dating back >5 years. But anyway don't focus on that too much: just run your experiments on as many of those binaries, which in turn yields real-world data in addition to the data obtained on your artificially crafted code. |
I've been checking some of the examples you gave me, here are my results
Methodology as before, "inlined" means that the binary was produced from the original with
|
Thanks a lot for undertaking these experiments!!! If you have produced any scripts that could be of repeated value, feel free to drop them in the scripts/ folder (duplicate work is to be avoided...). As all CI tasks are failing, could you please address this? Then a proper review on my part is due... |
@tautschnig The travis and AppVeyor tasks are failing for different reasons. I know why the AppVeyor one is failing and it's easy enough to address, I'm still investigating the travis one. |
Running in gdb (when building using |
8e85f41
to
6c7382a
Compare
Not quite "done done", but this should only require minor changes to pass the remaining tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the author's own admission, this code is "ununderstandable" - that needs to be fixed before a serious review can be conducted.
I'm all for improving the performance, but replacing a rather simple code by something "ununderstandable" to handle some edge case (which is what I consider 10k+-statement functions) doesn't seem to be justified.
src/analyses/cfg_dominators.h
Outdated
@@ -16,20 +16,163 @@ Author: Georg Weissenbacher, [email protected] | |||
#include <list> | |||
#include <map> | |||
#include <iosfwd> | |||
#include <cassert> | |||
#include <algorithm> | |||
#include <fstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would that be necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from debugging, removing
src/analyses/cfg_dominators.h
Outdated
template<typename T, typename node_indext> | ||
struct dominatorst { | ||
using datat = dominators_datat<T, node_indext>; | ||
static constexpr node_indext npos = std::numeric_limits<node_indext>::max(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I recall that doesn't work with Visual Studio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work with VS2015+; In this case this can just be changed to const
but there was some discussion on moving the minimum requirement to VS2015.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that can be done for AppVeyor then surely that's fine - but for now the build fails at that line. Or is it just a missing #include <limits>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AppVeyor was failing because it didn't recognize constexpr, but is indeed missing, thanks for spotting this
src/analyses/cfg_dominators.h
Outdated
|
||
datat *dominators_data; | ||
node_indext node_index; | ||
mutable std::size_t cached_distance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a class with those three being made private
? Certainly true for a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason for it not to be, so sure I can change this.
src/analyses/cfg_dominators.h
Outdated
// | ||
INVARIANT(data == other.data, | ||
"iterators from different sets are not comparable"); | ||
return data != other.data || current_index != other.current_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check suggests that the INVARIANT above could be violated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, fixing this.
src/analyses/cfg_dominators.h
Outdated
struct fixedpointt { | ||
explicit fixedpointt(cfg_dominators_templatet &cfg_dominators) | ||
: cfg_dominators(cfg_dominators), | ||
dfsCounter(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No caMelCaSe, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
src/analyses/cfg_dominators.h
Outdated
P &program) { | ||
// Dominator Tree according to Lengauer and Tarjan | ||
// "A fast algorithm for finding dominators in a flow graph" | ||
// This is ununderstandable without reading the paper! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a good job done, then.
What should happen: surely it is possible to give the gist of the algorithm in a few sentences. Each of the key steps of the algorithm should then be commented on below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if/how I can clear this up, this will take a while though. This also relates to the eval/compress etc functions algorithms
src/analyses/cfg_dominators.h
Outdated
cfg_dominators.cfg.entry_map[cfg_dominators.entry_node] + 1; | ||
dfsCounter = 0; | ||
dfs(root); | ||
for (node_indext i = dfsCounter; i >= 2; --i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it 2
? I'm mainly asking as the original paper might assume indices starting from 1 (and I don't understand the code so I can't tell what it is supposed to do).
src/analyses/cfg_dominators.h
Outdated
void dominators_pretty_print_node(const T &node, std::ostream &out) | ||
{ | ||
template<class T> | ||
void dominators_pretty_print_node(const T &node, std::ostream &out) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No whitespace games, please.
src/analyses/cfg_dominators.h
Outdated
template<class P, class T, bool post_dom> | ||
void cfg_dominators_templatet<P, T, post_dom>::output(std::ostream &out) const { | ||
for (const auto &node : cfg.entry_map) { | ||
auto n = node.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above: don't change what doesn't need changing. In particular: whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was an accident
src/analyses/cfg_dominators.h
Outdated
// stored in this set are passed to find. | ||
// The Debug libstdc++ will (correctly!) run into an assertion failure | ||
// using std::find. std::less for some reason doesn't trigger this assertion | ||
// failure, so we use this as an ugly workaround until that code is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is fixing the root cause more problematic?
bc18e85
to
dac13b4
Compare
a40aa2b
to
026bd5b
Compare
With regard to the linter: I see little risk in adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we've got the code reformatted, but the "ununderstandable" bit remains, which equals the absence of comments and documentation.
src/analyses/cfg_dominators.h
Outdated
label(cfg_dominators.cfg.size() + 1), | ||
semi(cfg_dominators.cfg.size() + 1), | ||
size(cfg_dominators.cfg.size() + 1), | ||
bucket(cfg_dominators.cfg.size() + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's the magic behind the + 1
?
src/analyses/cfg_dominators.h
Outdated
} | ||
|
||
// compute intersection of predecessors | ||
for(const auto &edge : (post_dom ? node.out : node.in)) | ||
void compress(node_indext v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had feared as much: pretty much all of my previous comments are gone as they got killed by reformatting. Those comments were: where is the documentation?
src/analyses/cfg_dominators.h
Outdated
const goto_programt::targett& target, | ||
std::ostream& out) | ||
const goto_programt::targett &target, | ||
std::ostream &out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, but unnecessary.
src/analyses/cfg_dominators.h
Outdated
typedef cfg_dominators_templatet<const goto_programt, | ||
goto_programt::const_targett, | ||
true> | ||
cfg_post_dominatorst; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are no changes here.
Sticking close to the paper is great, and I wouldn't want you to diverge from that. But adding comments in between should still be feasible? I'm not asking for long essays, just a single sentence describing the steps. Maybe that's just me, but what I'm doing in symex_other.cpp of #1800 is of that nature: a paragraph upfront and then link the individual steps to that overall story. |
@tautschnig I think that is a good idea, I have been looking into the paper again to see if I can give a good summary of the steps. |
I would be in favor of keeping the original dominator implementation around as well. Then users could either hardcode which one to use, or we could make it switchable via commandline (as suggested by @chrisr-diffblue). Making it switchable would probably require an abstract dominator factory as users of the dominator computation need the ability to create dominator classes. We could add a commandline option with which to select the dominator computation implementation, and then based on that instantiate a concrete dominator factory. We would then pass it along to all sites where dominator objects are currently created, and the dominator objects would be created via the factory. Any thoughts? |
@tautschnig @danpoe Has provided some documentation, would you mind taking a look at it? |
Will review in due course; just one remark: I'd appreciate if "Merge pull request #1 from danpoe/tmp/develop_dominators_fake_set" were removed so that the commit history brought in doesn't have merge commits. |
I will change history |
ca4d25c
to
0bff1f3
Compare
const_iterator find(const T &node) const | ||
{ | ||
std::less<T> less; | ||
// FIXME This works around a bug in other parts of the code in particular, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I would still like to understand why that's being worked around here rather than fixing it at the root of the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Bug" is maybe the wrong word here. The "T" here is going to be a const_iterator
of a std::list
(this is how const_targett
in goto_program_template.h
is defined). Normally this doesn't have an ordering, only equality which is only defined on iterators of different lists. In other parts of the code a custom ordering for list is defined (based on the address of the thing the iterator is pointing to). We can't redefine equality like that though, because the iterator already has a defined equality.
"Fixing the root issue" in this case would mean rewriting all the code that uses iterators to identify locations in the goto code to no longer do that (and use something else instead). Regardless of whether or not that's a good idea, this would be a major change in the code and should therefore be in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What raised my eyebrows was the part below: "in particular, dependence_graph.cpp, where iterators to different lists than those that
are stored in this set are passed to find." That sounds like a problem that's avoidable, as opposed to what you are saying above, which seems inherent in the design of the data structures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding the dependence_graph.cpp correctly, then this 'property' of comparing items from different iterators is a fairly fundamental aspect of how dependence_graph works with goto programs, in that it is keeping around sets of const_targett
where they may be from different goto-functions (and hence, different iterators) in the case of cross-function dependencies. It seems that dependence_graph.cpp already uses a similar pattern to @hannes-steffenhagen-diffblue in various places, e.g. dep_graph_domaint::merge
is using <
for comparisons when merging dependencies (OK, so that's partly just the merge algorithm but also means it tolerates const_targett
s in different functions/iterators).
I think any attempt to fixup dependence_graph.cpp would definitely be outside the scope of this particular PR, but we can always raise an issue to cover that and come back to that in a different PR if people feel strongly enough that it should be fixed (of course, any out-of-tree code that builds on dependence_graph.cpp might end up also having to be fixed up in that case too). As it is, I think this PR is probably at least in keeping with the current implementation of dependence_graph.cpp.
However, as @martin-cs has mentioned, it's probably worth holding off merging this though until we find out if our end user gets any benefit in their use-case.
src/analyses/cfg_dominators.h
Outdated
template <class P, class T, bool post_dom> | ||
void cfg_dominators_templatet<P, T, post_dom>::output(std::ostream &out) const | ||
{ | ||
for(const auto &node : cfg.entry_map) | ||
{ | ||
auto n=node.first; | ||
auto n = node.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary.
src/analyses/cfg_dominators.h
Outdated
typedef cfg_dominators_templatet<const goto_programt, | ||
goto_programt::const_targett, | ||
true> | ||
cfg_post_dominatorst; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary to change.
src/analyses/cfg_dominators.h
Outdated
template <class T> | ||
void dominators_pretty_print_node(const T &node, std::ostream &out) | ||
{ | ||
out << node; | ||
} | ||
|
||
/// Pretty-print a single node. | ||
/// \param node: node to print |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually called target
(and thus trips up the doxygen check)
Can we wait merging this until I've heard definitively from the end
user? I'm not sure if this fully solves their problem.
Cheers,
- Martin
|
Marking do-not-merge per @martin-cs' request. |
9a4a0ec
to
e0ba995
Compare
Use Lengauer&Tarjan's algorithm to calculate the tree of immediate dominators instead of the previous algorithm, which gives us a significant performance improvement. Also, instead of storing the dominators in a std::set we now just store the immediate dominators and make lookups in there instead. This is a significant performance improvement as long as the dominators set are only iterated over or only queried infrequently, as is currently the case, and also saves a lot of memory in the case of large functions (previous memory usage was quadratic with function size, now it is linear). In cases where a lot of queries are made against the same set of dominators, they can still be copied to a local std::set prior to that.
This is a workaround to a bug that gets triggered in dependence_graph.cpp - in the function dep_graph_domaint::control_dependencies, find is called on cfg_dominators_templatet::target_sett, with an iterator parameter called "from". It is not ensured that this iterator is from the same list as the iterators within the dominators set, this is a problem according to C++ Draft Standard N3960: 24.2.1 An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to elements of the same sequence. 24.2.5 The domain of == for forward iterators is that of iterators over the same underlying sequence.
e0ba995
to
4f89a2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions; let's get this in as dominators vs. dominator trees is a major win memory-wise (and apparently time-wise too)
dfs_counter(0), | ||
// Data structures have size cfg.size() + 1 as node indices are 1-based | ||
// to match the paper of Lengauer/Tarjan. | ||
parent(cfg_dominators.cfg.size() + 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace these 9 vectors with a vector-of-structs
explicit fixedpointt(cfg_dominators_templatet &cfg_dominators) | ||
: cfg_dominators(cfg_dominators), | ||
dfs_counter(0), | ||
// Data structures have size cfg.size() + 1 as node indices are 1-based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it important that the reserved 0 is lower than all the allocated indices (e.g. so that the null node <= the root which is <= all other nodes?) If not suggest just using 0-based indices and note the deviation from the paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smowton I think not, IIRC I just got fed up trying to translate the logic from the paper and eventually just opted to just replicate their logic exactly, including indexing.
link(parent[w], w); | ||
|
||
// Step 3 | ||
// Implicitely define immediate dominator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling
// ugly workaround until that code is fixed. | ||
|
||
// NOLINTNEXTLINE | ||
return std::find_if(cbegin(), cend(), [&](const T &other_node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just comparing by location number? Those should be unique (if not, the caller must run goto_programt::update()
first)
Or, for other T, require that the supplied T provide some accessor that yields a unique ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might work, I don't think I was aware of location numbers at the time or that they were meant to be unique.
|
||
\*******************************************************************/ | ||
|
||
#include <string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing: how about retaining the existing dominator algorithm, which is simple enough to check by inspection, randomly generating a couple of hundred medium-sized graphs and checking they give identical results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smowton That's more or less what I did manually when testing this IIRC, but doing this as part of automated testing of course makes a lot of sense
@tautschnig : this should not be blocked on my account any more. I no longer have responsibility for the end-user application that originally started this. Maybe @chrisr-diffblue can give a more up-to-date comment. IIRC this patch didn't solve the problem and I'm not sure if I knew why. If anyone really cares I can try to find the original correspondence on the issue. |
Use Lengauer&Tarjan's algorithm to calculate the tree of
immediate dominators instead of the previous algorithm,
which gives us a significant performance improvement.
Also, instead of storing the dominators in a std::set we
now just store the immediate dominators and make lookups
in there instead. This is a significant performance improvement
as long as the dominators set are only iterated over or only
queried infrequently, as is currently the case, and also
saves a lot of memory in the case of large functions (previous
memory usage was quadratic with function size, now it is linear).
In cases where a lot of queries are made against the same set of dominators,
they can still be copied to a local std::set prior to that.