More precise function pointer removal [depends-on: #2504] #2620

polgreen · 2018-07-26T08:16:32Z

Currently the function pointer removal replaces each function pointer with a case split between all functions that have the same signature and have had their address taken.

This PR is the beginning of work to refine that. Submitting as PR for early feedback.

This PR:

scans the goto program to find all assigns to function pointers
builds map of function pointers -> sets of functions
uses these maps when replacing the function pointers

Currently only works for:

standalone function pointers
function pointers 1 layer deep in a struct

If a function pointer is written to with anything other than a straight forward address of a function, default to original behaviour.

Current unsoundness:

if a function pointer is part of a struct, and the struct is written to with a struct, this needs to be detected.

polgreen · 2018-07-26T08:17:15Z

src/goto-programs/remove_function_pointers.cpp

-      goto_functions);
-
-  return rfp.remove_function_pointers(goto_program);
+    std::cout << "Statistics:\n"


I can't make mstream work here, and I have no idea why.

martin-cs · 2018-07-26T08:23:35Z

It's great that someone is working on this. @NlightNFotis did some work on separating the function pointer removal into "generate candidate lists" and "remove function pointers"; it would be great if that could get picked up / encorporated into this commit as I suspect you are doing something similar.

The goal of that work was to get the abstract interpreter to be able to handle programs with function pointers, so that we had a cheap route to finding things like this. It turns out there is one small subtlety that prevents this from working at the moment. If better function pointer removal is something you need then I'd be happy to discuss.

polgreen · 2018-07-26T08:31:11Z

Hi Martin, would be good to discuss, do you have time for a Skype chat at some point?

martin-cs · 2018-07-26T08:46:43Z

On Thu, 2018-07-26 at 01:31 -0700, E Polgreen wrote: Hi Martin, would be good to discuss, do you have time for a Skype chat at some point?

Yes; will organise "off-list".

smowton

Seems like a good thing to do -- main concern is this is getting towards writing a lowish precision value-set analysis. @peterschrammel @martin-cs do you think any of the current VSAs have the right sort of precision level, to the point that we could parameterise one of those with a predicate expressing "only pay attention to code-typed things, and never bother following pointers" and get a similar result? Possibly value_set_analysis_fit, but I'm not too familiar with how it works?

smowton · 2018-07-26T08:51:10Z

src/goto-programs/remove_function_pointers.cpp

@@ -90,6 +107,12 @@ class remove_function_pointerst:public messaget
  }
 };

+static irep_idt get_member_identifier(const member_exprt &member)
+{
+  return id2string(member.symbol().get_identifier()) +


Note member.symbol() throws an invariant if this member_exprt is not ultimately based on a symbol (e.g. member-of-dereference-of... will fail). Therefore you probably need to implement your own recursion here.

smowton · 2018-07-26T08:51:52Z

src/goto-programs/remove_function_pointers.cpp

+  /// stored as irep_idts.
+  /// The map is stored as variable in remove_function_pointerst
+  /// \param goto_functions goto functions to scan through
+  void get_all_writes_to_pointers(goto_functionst &goto_functions);


gather_all_writes_to_function_pointers (get would usually return something)

smowton · 2018-07-26T08:52:09Z

src/goto-programs/remove_function_pointers.cpp

@@ -255,6 +281,7 @@ void remove_function_pointerst::fix_return_type(
    old_lhs, typecast_exprt(tmp_symbol_expr, old_lhs.type()));
 }

+


Accidental change?

smowton · 2018-07-26T08:59:24Z

src/goto-programs/remove_function_pointers.cpp

+
+    if(function_pointer_map.find(identifier) != function_pointer_map.end())
+    {
+      status() << "Replacing function pointer from map of possible assigns\n";


"\n" --> eom

smowton · 2018-07-26T09:02:20Z

src/goto-programs/remove_function_pointers.cpp

+      identifier = get_member_identifier(to_member_expr(pointer));
+    }
+
+    if(function_pointer_map.find(identifier) != function_pointer_map.end())


Instead of find followed by [] use a construct like:

auto find_identifier = function_pointer_map.find(identifier); if(find_identifier != function_pointer_map.end()) { functionst &map_entry = find_identifier->second; ...

smowton · 2018-07-26T09:11:02Z

src/goto-programs/remove_function_pointers.cpp

+
+      if(lhs.type().id() == ID_pointer && lhs.type().subtype().id() == ID_code)
+      {
+        irep_idt lhs_irep;


lhs_identifier?

smowton · 2018-07-26T09:16:11Z

src/goto-programs/remove_function_pointers.cpp

+static irep_idt get_member_identifier(const member_exprt &member)
+{
+  return id2string(member.symbol().get_identifier()) +
+      id2string(member.get_component_name());


Suggest indexing this map by exprts rather than strings, since it will surely be useful sooner or later to decompose them again

smowton · 2018-07-26T09:17:29Z

src/goto-programs/remove_function_pointers.cpp

+
+        if(rhs.id() == ID_address_of && rhs.type().subtype().id() == ID_code)
+        {
+          irep_idt rhs_irep =


function_identifier?

smowton · 2018-07-26T09:18:00Z

src/goto-programs/remove_function_pointers.cpp

+        {
+          irep_idt rhs_irep =
+            to_symbol_expr(to_address_of_expr(rhs).object()).get_identifier();
+          if(function_pointer_map.find(lhs_irep) != function_pointer_map.end())


As above avoid repeated lookups (find then [] etc) if possible

smowton · 2018-07-26T09:19:02Z

src/goto-programs/remove_function_pointers.cpp

+          irep_idt rhs_irep =
+            to_symbol_expr(to_address_of_expr(rhs).object()).get_identifier();
+          if(function_pointer_map.find(lhs_irep) != function_pointer_map.end())
+            (function_pointer_map[lhs_irep]).insert(rhs_irep);


This if block currently redundant (function_pointer_map[lhs_irep].insert(rhs_irep) does both cases, no need for the = {{rhs_rep}} special case)

polgreen · 2018-07-26T10:56:17Z

I've tried the default value set analysis but that was just miles and miles off the kind of scalability that I need; happy to try others if someone has suggestions for what might work.

smowton · 2018-07-26T11:00:25Z

Plain value_set_analysist? If so agreed that's way too heavy. AFAIK the _fi variant is more vague, but I'm not that familiar. Hoping either @peterschrammel or @martin-cs knows more about those cruder-but-quicker versions.

smowton · 2018-07-26T11:05:21Z

To clarify the level of precision wanted btw, as I understand:

// somewhere
global_ptr = &f;
global_struct.field1 = &h;
global_struct.field2 = &x;

// somewhere else
global_ptr = &g;
global_struct.field1 = &i;
global_struct.field2 = &y;

// still somewhere else
global_struct.field2 = *something->complicated;

// And finally:
something->else.complicated = &abc;

Desired result = "global_ptr could be f or g, global_struct.field1 could be h or i, global_struct.field2 could be anything"? That is, we intentionally ignore the assignment of function abc that could get anywhere?

martin-cs · 2018-07-26T11:09:30Z

Seems like a good thing to do -- main concern is this is getting towards writing a lowish precision value-set analysis.

Yes; that's the situation we got to with a major automotive customer. @thk123 did some great work on getting a number of their special cases to work (so that ```dispatch_table[3]``` would resolve when ```dispatch_table``` was a global constant) but really, once you are doing anything that requires understanding the context of the code / the rest of the program, you are slowly heading towards writing an abstract interpreter. It would be good if we can avoid having a fourth abstract interpretation implementation (although I suspect that there may be more lurking in goto-instrument).

@peterschrammel @martin-cs do you think any of the current VSAs have the right sort of precision level, to the point that we could parameterise one of those with a predicate expressing "only pay attention to code-typed things, and never bother following pointers" and get a similar result? Possibly `value_set_analysis_fit`, but I'm not too familiar with how it works?

I would hope so. To recap the situation with abstract interpretation implementations in CPROVER. ai.h : ait -- the future, where all of the scalability improvements are / will be happening. At the moment this is fixed to be location sensitive (i.e. one domain per location) but I'm in the process of merging patches to change this. Designed to be run after function pointer removal and return removal. It is non-trivial to change this but at some point we should. static_analysis.h : static_analysist -- the older framework, deprecated in favour of ai.h. Uses one domain per location. The one advantage is that it should be able to handle function pointers and returns. At the moment this has one user, value_set_analysist, which is only (in the main branch!) used in goto-instrument in a context where function pointers have been removed. Is a likely candidate for porting into ait. flow_insensitive_analysis.h : flow_insensitive_analysist -- the oldest of the frameworks I think. Uses one domain, total! Should handle returns and function pointers -- as much as that is meaningful. Used for value_set_analysis_{fi,fivr,fivrns}t (depending on if you want validity regions and sharing). _fi is used by reaching definitions (and thus dependency graph and thus slicing). [ None of this should be confused with value_sett, which is how goto- symex handles pointers and value_setst which is a generic interface for getting pointers. ] So, things to try: 1. value_set_analysist without doing function pointer removal. But you had said this doesn't scale. 2. implement as a flow_insensitive_abstract_domain_baset and use flow_insensitive_analysist . 3. get involved in merging the context sensitivity and variable sensitivity domains.

polgreen · 2018-07-26T12:17:23Z

To clarify: basically I do want a low precision value set analysis, that is flow insensitive and that only considers function pointers. As far as I can tell the existing value set analysis variants don't currently do that.

The issue:
Running goto-instrument --remove_function-pointers replaces all function pointers with in some cases 1000 candidate functions. We need to start our analysis somewhere in the middle of our program, i.e., without fully analysing all the initialisation code that may set these function pointers, so this huge case split is a problem for us.

The end goal:
That goto-instrument --remove-function-pointers would be replace function pointers with fewer candidate functions, because we believe that this is a blocking point for us running CBMC on linux/xen.

Level of precision

To that end, the level of precision required is "something better than what currently exists, and that scales well enough to be able to run an analysis on linux/xen".

The precision that I am aiming for, described by @smowton above, we think would give enough of an improvement to make running linux/xen. This would be only for function pointers, I don't care about any other pointers, or any other variables.

smowton · 2018-07-26T12:21:04Z

*nod
If value-set-analysis-fi was sensible, then it might suffice to parameterise it with a predicate (here set to has-function-pointer-type) to advise it what sorts of things to track. Something like

domaint::apply_assignment(...)
{
  if(!user_interested_in_lhs(assign.lhs()) return;
}

polgreen · 2018-07-26T13:01:57Z

Hmm any insight into how big that "if" is though?

smowton · 2018-07-26T13:36:02Z

Medium sized. Martin's comment above sounds promising -- it maintains one domain, not one domain per program point, which matches the analysis you wrote for this PR.

Suggested steps to finding out:

Add a show-value-sets-fi option to goto-instrument, which is just like --show-value-sets except it uses value_set_analysis_fi
Hack it to ignore assigns that don't target function pointer typed things by altering src/pointer-analysis/value_set_domain_fi.cpp to check that before calling value_set.apply_code
Run --show-value-sets-fi on Xen and see if it finishes in some acceptable time

polgreen · 2018-07-27T07:11:13Z

I spoke to @kroening and apparently value_set_analaysis and value_set_analysis_fi are deprecated code (or soon to be deprecated) and I should avoid building dependencies on them

polgreen · 2018-07-27T08:13:12Z

I thought I'd try it anyway. So I ran the value_set_analysis_fi on a small linux thing with 332923 effective lines of code (as counted by goto-instrument), and it ran out of memory after 271s

 Maximum resident set size (kbytes): 61120980

The output suggests the flow insensitive value set analysis is building a value set for each function, but the value set appears to be the same for each function. Is it possible that value set is for some reason building the full value set for the entire program, and then repeating that for every function?

The reason I suspect that is because I did successfully run this on Xen for long enough for it to produce the value set for ~10 different functions, and the value set for all functions was identical, and every single function contained the value set for x86_emulate::ops, including "vsnprintf".

If value_set_analysis_fi really is doing this, that would increase the memory consumption quite a lot

smowton · 2018-07-27T08:32:58Z

That is a bit peculiar. I'll have a mess with it over the weekend. Is the big Xen GBF image available anywhere for testing purposes?

smowton · 2018-07-27T08:33:57Z

(Dependency wise I'm not sure I agree -- if they're useful then they're useful, and we should patch up their shortcomings rather than throw them away then write largely the same code over again)

smowton · 2018-07-27T08:37:52Z

(btw was this with limiting to function pointer values? I think that would vastly reduce the amount of information it records, and therefore its memory footprint)

martin-cs · 2018-07-27T09:11:31Z

@smowton : w.r.t. to depreciated code, I'd been thinking (time willing) of doing the following:

(*. At some point... work out a good way of allowing ait to be used before function pointer and return removal. This would be generally A Good Thing.)

port value_set_analysist to use ait. This would temporarily loose the ability to run this pre-function pointer removal but we're not actually using it so...
Move the functionality of --show-value-sets from goto-instrument to goto-analyzer.
Fully deprecate static_analysist.
Add a "one domain total" option to ait (this requires a little more infrastructure that is in my next-but-one patch to be merged).
Port value_set_analysis_fi([a-z]*) to use it. Again, this would temporarily loose the ability to run this pre-function pointer removal but the one use-case of this doesn't actually do this so...
Fully deprecate flow_insensitive_analysist.

How does that fit with your plans / what you are supporting?

martin-cs · 2018-07-27T09:13:09Z

@polgreen : One option would be to take the flow_insensitive_analysist and implement an instance of
flow_insensitive_abstract_domain_baset . This would give you options and an upgrade path.

polgreen · 2018-07-27T09:18:52Z

@smowton: This was limited to function pointer values, yes; I used track_optionst::TRACK_FUNCTION_POINTERS which appears to then only consider function pointers. I tried adding a manual check for function pointers, but that seems to do exactly the same as simply using TRACK_FUNCTION_POINTERS.

I can't give you the small linux thing to recreate the out-of-memory problem, but it's basically linux. So, if you build linux with noconfig, that would be about the right kind of thing [disclaimer: I haven't tried this, and building the small linux thing took 90 minutes].

Any reasonably large program will be enough to check the duplicate value sets.

The version of goto-cc in develop is currently broken for building Xen.
You can build the xen-syms binary using this branch: https://github.com/nmanthey/xen/tree/gotocc
and this version of goto-cc
https://github.com/tautschnig/cbmc/tree/xen-combined
and using the set up in this docker file:
#2504.

polgreen · 2018-07-27T09:57:23Z

@martin-cs

One option would be to take the flow_insensitive_analysist and implement an instance of
flow_insensitive_abstract_domain_baset . This would give you options and an upgrade path.

So, as I understand:

flow_insensitive_abstract_domain_baset is the statet used by flow_insensitive_analysis_baset.
flow_insensitive_analysist inherits from flow_insensitive_analyis_baset
value_set_analysis_fit inherits from flow_insensitive_analsysist?

I don't then understand why would I want a new instance of flow_insensitive_abstract_domain_baset? The existing set up provides the behaviour that I want, i.e., to only track function pointers by setting track_options in value_set_analysis_fit to be "TRACK_FUNCTION_POINTERS ".

It just has a high memory consumption and I don't see how I could change the memory consumption by changing flow_insensitive_abstract_domain_baset?

polgreen · 2018-07-31T07:31:10Z

@smowton, did you by any chance look at the value_set_analysis_fi?

smowton · 2018-07-31T09:01:45Z

No time yet sorry. Will try to do it this afternoon.

smowton · 2018-07-31T17:50:25Z

OK looked over this just now. Conclusions:

flow_insensitive_analysist appears to track exactly one state, not one state per function, so I'm unsure why you were seeing repetitions. Would you mind posting your driver code (the goto-instrument "show-value-sets"-esque code that I assume was written to test this)?
value_set_fit and cousins are pretty involved; it's probably more trouble than its worth bending them fit this use case
Since this can never need to reconsider a particular instruction, it's probably overkill to involve flow_insensitive_analysist at this stage. local_safe_pointerst is in a similar situation, being able to do its job in one pass.

Therefore I withdraw my objection to "yet another abstract interpreter." However this new behaviour probably should be configurable -- presumably there are use cases where the fragment of a program we're looking at doesn't include the function-pointer-assigning glue logic, so we really do want to consider everything type-compatible.

polgreen · 2018-08-01T09:05:29Z

The code I used to output the value_set_analsysis_fit is

    if(cmdline.isset("show-value-set-fi"))
    {
      namespacet ns(goto_model.symbol_table);
      value_set_analysis_fit value_set(
        ns, value_set_analysis_fit::track_optionst::TRACK_FUNCTION_POINTERS);

      value_set(goto_model.goto_functions);
      value_set.output(goto_model.goto_functions, std::cout);
      return CPROVER_EXIT_SUCCESS;
    }

On closer inspection it looks like the output function just outputs the same state N times, where N is the number of functions in the program then? Any reason why? I wondered if this is just a weird way of forcing function signatures to match for the inheritance, but it doesn't seem to be?

This is the output function, which iterates over the goto_functions
https://github.com/diffblue/cbmc/blob/develop/src/analyses/flow_insensitive_analysis.cpp#L67-L77
And calls this for every goto_function. This function then ignores the goto_program and identifier arguments and outputs the state?
https://github.com/diffblue/cbmc/blob/develop/src/analyses/flow_insensitive_analysis.cpp#L79-L85

smowton · 2018-08-01T13:42:23Z

Hah, that is pretty silly. I guess nobody ever used that output function!

To summarise the above btw if it wasn't clear: I'm happy to go ahead with this PR's approach once it's tidied a bit and has some tests -- no need to involve value-set-analysis.

Scans the goto program for assigns to function pointers and builds a map of assigns to those function pointers. Then uses only these functions to replace the function pointer with. If a function pointer is ever written to with anything more complicated than a straight forward address_of a function, defaults to normal behaviour. Only works for: - standalone function pointers - function pointers that are 1 level deep in a struct Current unsound-ness to be fixed: - if a function pointer is part of a struct, and the struct is assigned to in total, this is not detected.

tautschnig · 2019-02-25T21:04:16Z

@smowton @polgreen I have addressed the feedback, but what's really needed is further benchmarking. I will try to work on this (presumably via #2504).

tautschnig · 2022-10-10T11:21:13Z

Closing as #5436 ended up as the cleaner solution.

polgreen requested review from kroening, peterschrammel, pkesseli, smowton and tautschnig as code owners July 26, 2018 08:16

polgreen commented Jul 26, 2018

View reviewed changes

smowton reviewed Jul 26, 2018

View reviewed changes

tautschnig self-assigned this Nov 10, 2018

tautschnig force-pushed the function_pointers branch from 9da9979 to bde2536 Compare February 25, 2019 21:02

tautschnig changed the title ~~More precise function pointer removal~~ More precise function pointer removal [depends-on: #2504] Feb 25, 2019

tautschnig added the dependent - do not merge label Feb 25, 2019

tautschnig mentioned this pull request Feb 26, 2019

Compiling Xen with CBMC: docker file "expensive regression test" #2504

Merged

tautschnig closed this Oct 10, 2022

		@@ -255,6 +281,7 @@ void remove_function_pointerst::fix_return_type(
		old_lhs, typecast_exprt(tmp_symbol_expr, old_lhs.type()));
		}

More precise function pointer removal [depends-on: #2504] #2620

More precise function pointer removal [depends-on: #2504] #2620

Conversation

polgreen commented Jul 26, 2018

Choose a reason for hiding this comment

martin-cs commented Jul 26, 2018

polgreen commented Jul 26, 2018

martin-cs commented Jul 26, 2018 via email

smowton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polgreen commented Jul 26, 2018

smowton commented Jul 26, 2018

smowton commented Jul 26, 2018

martin-cs commented Jul 26, 2018 via email

polgreen commented Jul 26, 2018

smowton commented Jul 26, 2018

polgreen commented Jul 26, 2018

smowton commented Jul 26, 2018

polgreen commented Jul 27, 2018

polgreen commented Jul 27, 2018

smowton commented Jul 27, 2018

smowton commented Jul 27, 2018

smowton commented Jul 27, 2018

martin-cs commented Jul 27, 2018

martin-cs commented Jul 27, 2018

polgreen commented Jul 27, 2018

polgreen commented Jul 27, 2018

polgreen commented Jul 31, 2018

smowton commented Jul 31, 2018

smowton commented Jul 31, 2018

polgreen commented Aug 1, 2018

smowton commented Aug 1, 2018

tautschnig commented Feb 25, 2019

tautschnig commented Oct 10, 2022