Skip to content

Call graph enhancements #1498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Oct 19, 2017

Two enhancements to the call graph, again both used in the security analyser but probably useful to others:

  1. Optionally record callsites while building the graph, such that on seeing an A -> B edge in the callgraph it's easy to find out the calling instruction.
  2. Support export to a grapht-derived data structure, such that standard graph algorithms can be used on the call graph.

@smowton smowton force-pushed the smowton/feature/call_graph_improvements branch from f7eb504 to f6ddabc Compare October 20, 2017 13:36
@smowton smowton requested a review from martin-cs October 23, 2017 14:04
@polgreen
Copy link
Contributor

Hi Chris,

I've just had open source approval on some code I've written that rewrites the call_grapht class to inherit from grapht, because I wanted to use some of the base functions of grapht on the call graph.
It also:

  • computes the shortest path on the call graph and return the list of functions
  • computes the functions reachable within N function calls from a given function.

I'll create the pull request for it later today, and it might make sense to try to combine the two together

I think my pull request, would probably cover the need to export grapht from call_grapht, and using the grapht class makes it easier to get a callee from an edge (I think, without knowing how you are using that). The implementation of the invert function is, however, not as nice when using grapht.

@smowton
Copy link
Contributor Author

smowton commented Oct 23, 2017

The callsites would be an edge attribute, i.e. don't use empty_edget. It should probably be optional if there's a risk of bloating the structure massively for programs with lots of parallel call-graph edges.

@smowton
Copy link
Contributor Author

smowton commented Oct 23, 2017

(Also out of curiosity what is the open source approval process?)

@polgreen
Copy link
Contributor

Ah, I've used the nodes to store the function name irep_idt, and used empty_edget. I guess that storing the function name in the node and the fact that the nodes in a grapht have a vector of in edges, might eliminate the need for storing the caller name in the edge? Or storing the caller name in the edge could be used to eliminate the need to store the caller name in the node, but having both is probably not necessary.

Pull request here:
#1509

It may need breaking down into several smaller pull requests.

@smowton
Copy link
Contributor Author

smowton commented Nov 30, 2017

ACK, need to do more unification of @polgreen's changes and mine. Perhaps a Christmas holiday task for me :)

@smowton smowton force-pushed the smowton/feature/call_graph_improvements branch from f6ddabc to 9f25b0a Compare December 10, 2017 13:56
@smowton smowton force-pushed the smowton/feature/call_graph_improvements branch from 9f25b0a to 335b9ba Compare December 10, 2017 14:00
@smowton
Copy link
Contributor Author

smowton commented Dec 10, 2017

@tautschnig @polgreen this is now ready for re-review. All features from @polgreen's branch have been integrated AFAIK. It now consists of quite a few pieces so I've divided it up into individual commits, which are best reviewed individually.

@tautschnig
Copy link
Collaborator

@smowton I'm afraid doxygen isn't happy at the moment.

const irep_idt &callee=to_symbol_expr(function_expr).get_identifier();
add(function, callee);
if(collect_callsites)
callsites[{function, callee}].insert(i_it);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this not use the newly introduced add(caller, callee, callsite) function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does -- fixed in a later commit

explicit call_grapht(const goto_modelt &);
explicit call_grapht(const goto_functionst &);
explicit call_grapht(bool collect_callsites=false);
explicit call_grapht(const goto_modelt &, bool collect_callsites=false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe to drop explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so -- with a default argument there are one-arg versions of both constructors.

std::string call_grapht::format_callsites(const edget &edge) const
{
PRECONDITION(collect_callsites, "collect_callsites should be enabled");
PRECONDITION(collect_callsites);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick: this change is not in the right commit.

src/util/graph.h Outdated
@@ -469,6 +471,37 @@ void grapht<N>::visit_reachable(node_indext src)
}
}

template<class N>
std::vector<typename N::node_indext>
grapht<N>::get_reachable(node_indext src, bool forwards) const
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like the fact that this basically does the same as visit_reachable, but just returns the information in a different format -- yet neither is implemented in terms of the other.

/// \param graph: call graph
/// \param function: function to query
/// \return vector of called functions
std::vector<irep_idt> get_successors(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to use std::set or std::unordered_set -- otherwise it's not clear whether these will be unique or not. (They are not, but the ordering is completely arbitrary, which makes the result rather questionable.)

@smowton smowton force-pushed the smowton/feature/call_graph_improvements branch 2 times, most recently from 3c0d56c to ecea9e2 Compare December 11, 2017 17:25
@smowton
Copy link
Contributor Author

smowton commented Dec 11, 2017

@tautschnig all other changes applied. I implemented the non-const tagging visitor in terms of the const get-reachable, at the cost of some efficiency.

}
}

void call_grapht::output_xml(std::ostream &out) const
{
// Note I don't implement callsite output here; I'll leave that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rather than a comment a warning() if(collect_callsites)

@@ -54,6 +54,7 @@ SCENARIO("call_graph",
// {
// A();
// B();
// B();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems dangerous to modify the unit test in the same commit since the behaviour of the unit test shouldn't change (since we are using the default value of not collecting callsites). (That is I think adding extra callsites is not a good idea as part of this PR. Extending the unit test to verify the callsites are indeed empty is very diligent 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here is to highlight the difference between the multimap representation, which includes duplicate key-value pairs when there are parallel edges, and the grapht representation which does not.

const goto_programt &body)
static void forall_callsites(
const goto_programt &body,
std::function<void(goto_programt::const_targett, const irep_idt &)> callback)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name callback something more descriptive e.g. callsite_task? And ideally add doxygen comments.

@@ -25,6 +25,17 @@ class call_grapht
explicit call_grapht(const goto_modelt &, bool collect_callsites=false);
explicit call_grapht(const goto_functionst &, bool collect_callsites=false);

// These two constructors build a call graph restricted to functions
// reachable from the given root.
call_grapht(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have a lot of constructors - it might be worth considering a collection of named static functions instead? create_callgraph_from_root_node? Maybe even a builder pattern so the common parts can be setup first (e.g. collect_callsites).


#include "call_graph_helpers.h"

static std::set<irep_idt> get_neighbours(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: document (along with the rest of the functions in here).

Also, controversial suggestion, but neighbours doesn't mean much in the context of a callgraph, presumably it is the functions called by the specified function? Done using function successor and predecessor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose actually it is either the called functions, or the callsites, depending on the value of forwards - this should definitely be documented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see docs are in the header file. Personally prefer this, though I think we've been putting them in the source file - don't know if a decision has been reached @peterschrammel ?

return result;
}

std::set<irep_idt> get_successors(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider calling this get_called_functions since we are specifically dealing with a call graph?

Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from comments that have already been made by others, I'm happy to say this LGTM

This makes it easier for call-graph users to discover the particular instruction responsible
for a call-graph edge.

Ideally these would be recorded in-line with the map (i.e. change the graph type to
std::multimap<irep_idt, std::pair<locationt, irep_idt>>), but here we introduce an optional
auxiliary map to avoid breaking existing users.
Add export from call_grapht to grapht, such that generic graph algorithms
can be applied to the call graph.
This simply performs a depth-first search of the graph starting from
a nominated node and returns a vector of reachable indices.
These start from a nominated node and walk outwards, rather than build
the entire call-graph as usual.
These translate from function name to node ID and back, thus giving
more concise ways of running very simple graph algorithms over the
call graph.
No functional changes
No behavioural change intended; should simply replace a graph walk
with one that the call graph code can now perform by itself.
@smowton smowton force-pushed the smowton/feature/call_graph_improvements branch from ecea9e2 to 7a98e15 Compare December 13, 2017 11:30
@smowton smowton merged commit 7b06a00 into diffblue:develop Dec 16, 2017
smowton added a commit to smowton/cbmc that referenced this pull request May 9, 2018
67ec6f2 Merge remote-tracking branch 'upstream/develop' into pull-support-20180104
fabc99e Merge pull request diffblue#1563 from NathanJPhillips/feature/lazy-loading
2d67e42 Merge pull request diffblue#1692 from NathanJPhillips/feature/numbering-at
5266ba2 Merge commit 'ac4756fc3bb0e853f04de2b69f300d65cfbfc553' into pull-support-20171212
4f4a9c7 Add at method to template_numberingt
d9cc0c0 Merge pull request diffblue#1685 from peterschrammel/remove-existing-coverage-goals
f13c871 Update a comment in instrument_cover_goals
6c39443 Remove existing coverage goals
a2cf16d Store symbol type instead of followed type when clean casting
f96efb4 Change template parameter name to not clash with existing type
b0cd57b Encapsulate lazy_goto_modelt::goto_functions
ef02f4d Update test to handle changed symbol creation order
441d269 Reset counter used by get_fresh_aux_symbol in unit tests
f759f25 Fix new test to run remove_java_new pass
cb09d8e Fix new tests to use lazy loading
166563f Do lowering of java_new as a function-level pass
3779782 Always convert the main function from bytecode into the symbol_table
7e52e22 Always allow the main function not to have a body
c938b08 Encapsulate maps in language_filest
87c6b28 Don't erase symbol in java_bytecode_convert_methodt::convert
e3e44c8 Do type-checking as a function pass
2ba1364 Update load_all_functions
5f06e36 Recreate initialize in final
aa5440b Moved call to final to lazy_goto_modelt::finalize
a659bc0 Add lazy_goto_functions_mapt
7e1ecc9 Allow post-processing functions to run on a function-by-function basis
0c05be6 Allow convert_lazy_method on functions that don't have a lazy body
05a8da2 Make goto_convert_functions not add directly to function map
c078858 Introduce lazy_goto_modelt
a22dd1c Merge pull request diffblue#1671 from thk123/bugfix/TG-1278/correct-access-level
5b6eb00 Merge pull request diffblue#1668 from romainbrenguier/bugfix/string-index-of#TG-1846
9062853 Tidied up the generic specalised class copying the base class visibility
f934ca3 String classes should be public
7b06a00 Merge pull request diffblue#1498 from smowton/smowton/feature/call_graph_improvements
f3b4c9b Test for verification of the indexOf method
9a7fa2d Correct lower bound in String indexOf
682cd1a Use a symbol generator to avoid name clashes
28a2ada Access in empty array should be unconstrained
d41403f Merge pull request diffblue#1665 from romainbrenguier/bugfix/string-out-of-bound#TG-1808
ac7e649 Use CProverString function in jbmc tests
22dc353 Add CProverString class for jbmc-strings tests
ef610b3 Use CProverString functions in strings-smoke-tests
5c716c1 Add a CProverString class for string-smoke-tests
6b619e0 Deactivate preprocessing of charAt and substring
bcfaaa4 Merge pull request diffblue#1664 from svorenova/refactoring_tg1190
c2a8bae Refactoring in specialization of generic classes
7a98e15 Rename call graph constructors
6228ed3 Slice global inits: use improved call graph
d136bbc Expose limited call graph via goto-instrument
9c29bee Improve call graph unit tests
8ed3ccb Add documentation to call graph
8f6f429 Add call graph helpers
3b06a16 Call graph: add constructors that only include reachable functions
9b65862 grapht: add get_reachable function
aaa8513 Call graph -> grapht transformation
8115248 Call graph: optionally record callsites

git-subtree-dir: cbmc
git-subtree-split: 67ec6f2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants