Skip to content

Use more efficient algorithm to calculate dominators #1787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hannes-steffenhagen-diffblue
Copy link
Contributor

Use Lengauer&Tarjan's algorithm to calculate the tree of
immediate dominators instead of the previous algorithm,
which gives us a significant performance improvement.

Also, instead of storing the dominators in a std::set we
now just store the immediate dominators and make lookups
in there instead. This is a significant performance improvement
as long as the dominators set are only iterated over or only
queried infrequently, as is currently the case, and also
saves a lot of memory in the case of large functions (previous
memory usage was quadratic with function size, now it is linear).

In cases where a lot of queries are made against the same set of dominators,
they can still be copied to a local std::set prior to that.

@chrisr-diffblue
Copy link
Contributor

Would it be possible to split this into two commits (still all in the one PR), with the first commit being the change to Lengauer&Tarjan's algorithm, and then the second commit being the changes around the use of std::set ? That might make it a little clearer to review the changes?

@mgudemann
Copy link
Contributor

mgudemann commented Feb 1, 2018

two regression tests fail due to core dumps

  Running full_slice1/test.descbash: line 1: 12676 Aborted                 (core dumped) ../../../src/cbmc/cbmc --full-slice --property main.assertion.1 --unwind 1 'main.c' > 'test.out' 2>&1
  [FAILED]
  Running full_slice2/test.descbash: line 1: 12681 Aborted                 (core dumped) ../../../src/cbmc/cbmc --full-slice --property main.assertion.2 --unwind 1 'main.c' > 'test.out' 2>&1
  [FAILED]

caused by Error: attempt to compare iterators from different sequences.

@mgudemann
Copy link
Contributor

out of interest, what is the performance improvement ?

@martin-cs
Copy link
Collaborator

martin-cs commented Feb 1, 2018 via email

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to dig in my emails as some years ago I had changed the implementation according to some paper, but completely failed to document this in the source (and git log doesn't seem to be helpful either). Using dominator trees is definitively a major step forward, though.

enum class Constness
{
Constant,
NotConstant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No CaMeLCase, please.

bool changed=false;
typename cfgt::nodet &node=cfg[cfg.entry_map[current]];
if(node.dominators.empty())
void dfs(node_indext root)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not depth-first search on a grapht? If it isn't, please add comments. If it is, please move the implementation to util/graph.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is dfs on a grapht, yes. How would you like it moved to graph.h? Like

template<typename Graph, typename Function> void depth_first_search(const Graph& graph, typename graph::index_t start_node, Function function);?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't it become a member of grapht?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, what I meant was if you wanted it to take a "per node work" function as a parameter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your suggestion is good, except I’d recommend making it a member function.

REQUIRE(log_doms.find(dominators_data.data[4]) != log_doms.end());
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing end-of-line.

@tautschnig
Copy link
Collaborator

tautschnig commented Feb 1, 2018

So here's what I found in my emails: the present implementation follows Cooper et al. "A Simple, Fast Dominance Algorithm", which claims to be faster in practice than Lengauer&Tarjan (even though worst-case complexity is incomparable). The changes to follow Cooper et al. were done in e6f36d9.

The move to Cooper et al. was motivated by poor performance observed while running my Debian experiments. I'd thus echo the request to provide details on benchmarking that has been done.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@mgudemann W.r.t the regression test failure, unfortunately I have a bit of a case of "doesn't happen on my machine here". This error is raised by my set iterators when iterators created from different sets are compared against each other; If this error is raised either my logic when creating these iterators is screwy or somewhere in the code iterators are used in an unsound manner. I'll investigate.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@tautschnig As to the Cooper algorithm: The current version is not a correct implementation of it: they use a special data structure to perform fast set intersections using some knowledge about how the graph is traversed, while we do not. This data structure is the main reason the algorithm runs fast on small graphs.

Second, the problem this PR is addressing (or at least trying to start to address) is the performance with large graphs, which Cooper suggests will be worse than Tarjan; Cooper suggests their algorithm runs better on practical graphs, where they consider "practical" to mean with < 1000 nodes. The graphs we were having troubles with are significantly larger than that.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

hannes-steffenhagen-diffblue commented Feb 1, 2018

@tautschnig @martin-cs @mgudemann

Regarding benchmarks, I've created some artificial test cases looking like this:

// v1
#include <assert.h>
int main(void) {
  int x;
  x ^= -253;
  assert(x > 0);
}
// v2
#include <assert.h>
int main(void) {
  int x;
  if (x < 0) { x = x ^ -965; } else { x = x ^ -189; }
  assert(x > 0);
}

Where the middle line is repeated (with different numbers) N times. I've tested with N=100, 1000 and 10000 (called small, medium and large respectively). I've run time goto-analyzer $file --dependence-graph --show on each file with the current code, with the tarjan algorithm and with the tarjan algorithm + the lazy set version, 10 times each collecting the mean and standard deviation. All times in seconds.

File Original Tarjan Algorithm Tarjan with lazy set
v1/small.c mean=0.371, sd=0.01370320 mean=0.370, sd=0.01490712 mean=0.353, sd=0.008232726
v1/medium.c mean=0.560, sd=0.02000000 mean=0.571, sd=0.02424413 mean=0.520, sd=0.011547005
v1/large.c mean=6.291, sd=0.74570548 mean=5.609, sd=0.32188507 mean=2.729, sd=0.019119507
v2/small.c mean=0.425, sd=0.02415229 mean=0.426, sd=0.01955050 mean=0.408, sd=0.006324555
v2/medium.c mean=2.018, sd=0.01549193 mean=1.571, sd=0.08556349 mean=1.259, sd=0.011972190
v2/large.c mean=169.184, sd=28.12850203 mean=96.477, sd=14.38272424 mean=55.728, sd=0.396226647

As you can see, with large graphs the difference is quite substantial (overall 3 times faster in the v2/large case). I've also used callgrind to get some more detailed information, which is where I noticed that in the tarjan case we spend nearly all the time in the last step building the sets, which is how I got the idea to use lazy sets (my tarjan without lazy sets version is almost identical, but uses normal std::sets in assign_dominators).

Here are the results of running the v2/large set with callgrind. Notice that the current version spends nearly 70% of the time in the fixedpoint step, with the tarjan+lazy set version it's only 0.1% even though the overall running time is significantly lower. This performance gained at that point is partially paid back in parts of the code using the dominators because of the linear lookup time, but the differential in my test cases implies it's still worth doing.

Current version:
develop_profiling

With tarjan algorithm:
develop_dominators_tarjan

With tarjan + lazy set:
develop_dominators_fake_set

Of course there's a risk that these test files happen to be pathological cases, @martin-cs suggested I try to run these against small/medium open source projects which I'm going to do tomorrow. Please let me know if there's any other statistic you're interested in.

@tautschnig
Copy link
Collaborator

For benchmarking, would you be able to play with packages from http://debian-fortification.net/? The files in the various goto-binaries.tar.bz2 should do.

@tautschnig
Copy link
Collaborator

As the current interface at http://debian-fortification.net/ is very cumbersome I am attaching a list of URLs of >9000 goto-binary archives: all-binaries-latest.txt

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@tautschnig Thanks, I'm looking into one of those right now.

@martin-cs
Copy link
Collaborator

martin-cs commented Feb 2, 2018 via email

@tautschnig
Copy link
Collaborator

@martin-cs Understood, and you’ll find code of this kind in the collection of binaries/Debian packages - it’s really been 10,000+ instruction functions that prompted me to first debug the scalability failures and then implement Cooper et al.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@tautschnig Do you remember one or two in particular that were causing problems?

@tautschnig
Copy link
Collaborator

@tautschnig Do you remember one or two in particular that were causing problems?

I'm afraid I don't, and I don't think I would have any logs dating back >5 years. But anyway don't focus on that too much: just run your experiments on as many of those binaries, which in turn yields real-world data in addition to the data obtained on your artificially crafted code.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@tautschnig

I've been checking some of the examples you gave me, here are my results

file Original Tarjan Algorithm Tarjan with lazy set
gzip mean=1.864, sd=0.0275681 mean=1.863, sd=0.05498485 mean=1.892, sd=0.01549193
gzip (inlined) OOM OOM ~20 minutes
dmenu mean=0.217, sd=0.009486833 mean=0.217, sd=0.006749486 mean=0.216, sd=0.00843274
dmenu (inlined) mean=14.019, sd=0.08646772 mean=9.381, sd=0.05646041 mean=7.972, sd=0.03994441
grep mean=9.603, sd=0.1028537 mean=9.448, sd=0.1350555 mean=9.448, sd=0.08689713
git mean=67.886, sd=0.6079876 mean=66.864, sd=0.2416241 mean=67.231, sd=0.580181
nc mean=0.401, sd=0.0218327 mean=0.402, sd=0.02440401 mean=0.393, sd=0.01888562
nc (inlined) mean=1.151, sd=0.01197219 mean=1.083, sd=0.008232726 mean=1.057, sd=0.0105935

Methodology as before, "inlined" means that the binary was produced from the original with goto-instrument --function-inline main (see table below for size differences - note especially that inlined gzip has a 100 fold size increase over the original, which gives an explanation for the huge difference in runtime). I didn't inline all of them as even with the smaller binaries I've been running into memory problems, but I think this much testing should be sufficient to notice a trend. In general, for "normal" binaries with mostly small functions we can see that the difference in overall runtime between the 3 versions is too small to conclusively say that one is better than the other, but for the inlined versions which have huge functions we can see some significant performance differences favouring the new version.

file original inlined
gzip 2.2M 225M
dmenu 356K 2.0M
nc 496K 608K

@tautschnig
Copy link
Collaborator

Thanks a lot for undertaking these experiments!!! If you have produced any scripts that could be of repeated value, feel free to drop them in the scripts/ folder (duplicate work is to be avoided...).

As all CI tasks are failing, could you please address this? Then a proper review on my part is due...

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@tautschnig The travis and AppVeyor tasks are failing for different reasons. I know why the AppVeyor one is failing and it's easy enough to address, I'm still investigating the travis one.

@tautschnig
Copy link
Collaborator

Running in gdb (when building using -D_GLIBCXX_DEBUG) makes debugging the iterator comparison problems pretty easy, but I suppose that's the obvious part.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

Not quite "done done", but this should only require minor changes to pass the remaining tests.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the author's own admission, this code is "ununderstandable" - that needs to be fixed before a serious review can be conducted.

I'm all for improving the performance, but replacing a rather simple code by something "ununderstandable" to handle some edge case (which is what I consider 10k+-statement functions) doesn't seem to be justified.

@@ -16,20 +16,163 @@ Author: Georg Weissenbacher, [email protected]
#include <list>
#include <map>
#include <iosfwd>
#include <cassert>
#include <algorithm>
#include <fstream>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that be necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from debugging, removing

template<typename T, typename node_indext>
struct dominatorst {
using datat = dominators_datat<T, node_indext>;
static constexpr node_indext npos = std::numeric_limits<node_indext>::max();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I recall that doesn't work with Visual Studio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work with VS2015+; In this case this can just be changed to const but there was some discussion on moving the minimum requirement to VS2015.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that can be done for AppVeyor then surely that's fine - but for now the build fails at that line. Or is it just a missing #include <limits>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppVeyor was failing because it didn't recognize constexpr, but is indeed missing, thanks for spotting this


datat *dominators_data;
node_indext node_index;
mutable std::size_t cached_distance;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a class with those three being made private? Certainly true for a pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason for it not to be, so sure I can change this.

//
INVARIANT(data == other.data,
"iterators from different sets are not comparable");
return data != other.data || current_index != other.current_index;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check suggests that the INVARIANT above could be violated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, fixing this.

struct fixedpointt {
explicit fixedpointt(cfg_dominators_templatet &cfg_dominators)
: cfg_dominators(cfg_dominators),
dfsCounter(0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No caMelCaSe, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

P &program) {
// Dominator Tree according to Lengauer and Tarjan
// "A fast algorithm for finding dominators in a flow graph"
// This is ununderstandable without reading the paper!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a good job done, then.

What should happen: surely it is possible to give the gist of the algorithm in a few sentences. Each of the key steps of the algorithm should then be commented on below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if/how I can clear this up, this will take a while though. This also relates to the eval/compress etc functions algorithms

cfg_dominators.cfg.entry_map[cfg_dominators.entry_node] + 1;
dfsCounter = 0;
dfs(root);
for (node_indext i = dfsCounter; i >= 2; --i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it 2? I'm mainly asking as the original paper might assume indices starting from 1 (and I don't understand the code so I can't tell what it is supposed to do).

void dominators_pretty_print_node(const T &node, std::ostream &out)
{
template<class T>
void dominators_pretty_print_node(const T &node, std::ostream &out) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No whitespace games, please.

template<class P, class T, bool post_dom>
void cfg_dominators_templatet<P, T, post_dom>::output(std::ostream &out) const {
for (const auto &node : cfg.entry_map) {
auto n = node.first;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above: don't change what doesn't need changing. In particular: whitespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was an accident

// stored in this set are passed to find.
// The Debug libstdc++ will (correctly!) run into an assertion failure
// using std::find. std::less for some reason doesn't trigger this assertion
// failure, so we use this as an ugly workaround until that code is fixed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is fixing the root cause more problematic?

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue force-pushed the develop_dominators_fake_set branch 2 times, most recently from bc18e85 to dac13b4 Compare February 13, 2018 14:01
@tautschnig
Copy link
Collaborator

With regard to the linter: I see little risk in adding // NOLINTNEXTLINE(whitespace/braces) as that wouldn't really silence more than what it is expected to silence. If that still seems too much of a concern, then someone needs to sit down and fix the linter.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we've got the code reformatted, but the "ununderstandable" bit remains, which equals the absence of comments and documentation.

label(cfg_dominators.cfg.size() + 1),
semi(cfg_dominators.cfg.size() + 1),
size(cfg_dominators.cfg.size() + 1),
bucket(cfg_dominators.cfg.size() + 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the magic behind the + 1?

}

// compute intersection of predecessors
for(const auto &edge : (post_dom ? node.out : node.in))
void compress(node_indext v)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had feared as much: pretty much all of my previous comments are gone as they got killed by reformatting. Those comments were: where is the documentation?

const goto_programt::targett& target,
std::ostream& out)
const goto_programt::targett &target,
std::ostream &out)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but unnecessary.

typedef cfg_dominators_templatet<const goto_programt,
goto_programt::const_targett,
true>
cfg_post_dominatorst;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are no changes here.

@tautschnig
Copy link
Collaborator

The code was deliberately written to be as close to the paper as possible, to allow easy cross checking against it. I'm not opposed to clearing it up as much as it is possible to do so without destroying that connection, but I'm not sure if the readability would be substantially improved that way.

Sticking close to the paper is great, and I wouldn't want you to diverge from that. But adding comments in between should still be feasible? I'm not asking for long essays, just a single sentence describing the steps. Maybe that's just me, but what I'm doing in symex_other.cpp of #1800 is of that nature: a paragraph upfront and then link the individual steps to that overall story.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@tautschnig I think that is a good idea, I have been looking into the paper again to see if I can give a good summary of the steps.

@danpoe
Copy link
Contributor

danpoe commented Feb 15, 2018

I would be in favor of keeping the original dominator implementation around as well. Then users could either hardcode which one to use, or we could make it switchable via commandline (as suggested by @chrisr-diffblue).

Making it switchable would probably require an abstract dominator factory as users of the dominator computation need the ability to create dominator classes. We could add a commandline option with which to select the dominator computation implementation, and then based on that instantiate a concrete dominator factory. We would then pass it along to all sites where dominator objects are currently created, and the dominator objects would be created via the factory. Any thoughts?

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@tautschnig @danpoe Has provided some documentation, would you mind taking a look at it?

@tautschnig
Copy link
Collaborator

Will review in due course; just one remark: I'd appreciate if "Merge pull request #1 from danpoe/tmp/develop_dominators_fake_set" were removed so that the commit history brought in doesn't have merge commits.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

I will change history

const_iterator find(const T &node) const
{
std::less<T> less;
// FIXME This works around a bug in other parts of the code in particular,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would still like to understand why that's being worked around here rather than fixing it at the root of the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Bug" is maybe the wrong word here. The "T" here is going to be a const_iterator of a std::list (this is how const_targett in goto_program_template.h is defined). Normally this doesn't have an ordering, only equality which is only defined on iterators of different lists. In other parts of the code a custom ordering for list is defined (based on the address of the thing the iterator is pointing to). We can't redefine equality like that though, because the iterator already has a defined equality.

"Fixing the root issue" in this case would mean rewriting all the code that uses iterators to identify locations in the goto code to no longer do that (and use something else instead). Regardless of whether or not that's a good idea, this would be a major change in the code and should therefore be in a different PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What raised my eyebrows was the part below: "in particular, dependence_graph.cpp, where iterators to different lists than those that
are stored in this set are passed to find." That sounds like a problem that's avoidable, as opposed to what you are saying above, which seems inherent in the design of the data structures?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the dependence_graph.cpp correctly, then this 'property' of comparing items from different iterators is a fairly fundamental aspect of how dependence_graph works with goto programs, in that it is keeping around sets of const_targett where they may be from different goto-functions (and hence, different iterators) in the case of cross-function dependencies. It seems that dependence_graph.cpp already uses a similar pattern to @hannes-steffenhagen-diffblue in various places, e.g. dep_graph_domaint::merge is using < for comparisons when merging dependencies (OK, so that's partly just the merge algorithm but also means it tolerates const_targetts in different functions/iterators).

I think any attempt to fixup dependence_graph.cpp would definitely be outside the scope of this particular PR, but we can always raise an issue to cover that and come back to that in a different PR if people feel strongly enough that it should be fixed (of course, any out-of-tree code that builds on dependence_graph.cpp might end up also having to be fixed up in that case too). As it is, I think this PR is probably at least in keeping with the current implementation of dependence_graph.cpp.

However, as @martin-cs has mentioned, it's probably worth holding off merging this though until we find out if our end user gets any benefit in their use-case.

template <class P, class T, bool post_dom>
void cfg_dominators_templatet<P, T, post_dom>::output(std::ostream &out) const
{
for(const auto &node : cfg.entry_map)
{
auto n=node.first;
auto n = node.first;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unnecessary.

typedef cfg_dominators_templatet<const goto_programt,
goto_programt::const_targett,
true>
cfg_post_dominatorst;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unnecessary to change.

template <class T>
void dominators_pretty_print_node(const T &node, std::ostream &out)
{
out << node;
}

/// Pretty-print a single node.
/// \param node: node to print
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually called target (and thus trips up the doxygen check)

@martin-cs
Copy link
Collaborator

martin-cs commented Feb 16, 2018 via email

@tautschnig
Copy link
Collaborator

Marking do-not-merge per @martin-cs' request.

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue force-pushed the develop_dominators_fake_set branch 6 times, most recently from 9a4a0ec to e0ba995 Compare February 16, 2018 14:53
Use Lengauer&Tarjan's algorithm to calculate the tree of
immediate dominators instead of the previous algorithm,
which gives us a significant performance improvement.

Also, instead of storing the dominators in a std::set we
now just store the immediate dominators and make lookups
in there instead. This is a significant performance improvement
as long as the dominators set are only iterated over or only
queried infrequently, as is currently the case, and also
saves a lot of memory in the case of large functions (previous
memory usage was quadratic with function size, now it is linear).

In cases where a lot of queries are made against the same set of dominators,
they can still be copied to a local std::set prior to that.
This is a workaround to a bug that gets triggered in
dependence_graph.cpp - in the function
dep_graph_domaint::control_dependencies,
find is called on cfg_dominators_templatet::target_sett, with
an iterator parameter called "from".

It is not ensured that this iterator is from the same list
as the iterators within the dominators set, this is a problem
according to C++ Draft Standard N3960:

24.2.1

An iterator j is called reachable from an iterator i if and only if there
is a finite sequence of applications of the expression ++i that makes i == j.
If j is reachable from i, they refer to elements of the same sequence.

24.2.5

The domain of == for forward iterators is that of iterators over the same
underlying sequence.
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions; let's get this in as dominators vs. dominator trees is a major win memory-wise (and apparently time-wise too)

dfs_counter(0),
// Data structures have size cfg.size() + 1 as node indices are 1-based
// to match the paper of Lengauer/Tarjan.
parent(cfg_dominators.cfg.size() + 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace these 9 vectors with a vector-of-structs

explicit fixedpointt(cfg_dominators_templatet &cfg_dominators)
: cfg_dominators(cfg_dominators),
dfs_counter(0),
// Data structures have size cfg.size() + 1 as node indices are 1-based
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that the reserved 0 is lower than all the allocated indices (e.g. so that the null node <= the root which is <= all other nodes?) If not suggest just using 0-based indices and note the deviation from the paper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smowton I think not, IIRC I just got fed up trying to translate the logic from the paper and eventually just opted to just replicate their logic exactly, including indexing.

link(parent[w], w);

// Step 3
// Implicitely define immediate dominator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling

// ugly workaround until that code is fixed.

// NOLINTNEXTLINE
return std::find_if(cbegin(), cend(), [&](const T &other_node) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just comparing by location number? Those should be unique (if not, the caller must run goto_programt::update() first)

Or, for other T, require that the supplied T provide some accessor that yields a unique ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might work, I don't think I was aware of location numbers at the time or that they were meant to be unique.


\*******************************************************************/

#include <string>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing: how about retaining the existing dominator algorithm, which is simple enough to check by inspection, randomly generating a couple of hundred medium-sized graphs and checking they give identical results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smowton That's more or less what I did manually when testing this IIRC, but doing this as part of automated testing of course makes a lot of sense

@martin-cs
Copy link
Collaborator

@tautschnig : this should not be blocked on my account any more. I no longer have responsibility for the end-user application that originally started this. Maybe @chrisr-diffblue can give a more up-to-date comment. IIRC this patch didn't solve the problem and I'm not sure if I knew why. If anyone really cares I can try to find the original correspondence on the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants