Skip to content

More precise function pointer removal [depends-on: #2504] #2620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

polgreen
Copy link
Contributor

Currently the function pointer removal replaces each function pointer with a case split between all functions that have the same signature and have had their address taken.

This PR is the beginning of work to refine that. Submitting as PR for early feedback.

This PR:

  • scans the goto program to find all assigns to function pointers
  • builds map of function pointers -> sets of functions
  • uses these maps when replacing the function pointers

Currently only works for:

  • standalone function pointers
  • function pointers 1 layer deep in a struct

If a function pointer is written to with anything other than a straight forward address of a function, default to original behaviour.

Current unsoundness:

  • if a function pointer is part of a struct, and the struct is written to with a struct, this needs to be detected.

goto_functions);

return rfp.remove_function_pointers(goto_program);
std::cout << "Statistics:\n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make mstream work here, and I have no idea why.

@martin-cs
Copy link
Collaborator

It's great that someone is working on this. @NlightNFotis did some work on separating the function pointer removal into "generate candidate lists" and "remove function pointers"; it would be great if that could get picked up / encorporated into this commit as I suspect you are doing something similar.

The goal of that work was to get the abstract interpreter to be able to handle programs with function pointers, so that we had a cheap route to finding things like this. It turns out there is one small subtlety that prevents this from working at the moment. If better function pointer removal is something you need then I'd be happy to discuss.

@polgreen
Copy link
Contributor Author

Hi Martin, would be good to discuss, do you have time for a Skype chat at some point?

@martin-cs
Copy link
Collaborator

martin-cs commented Jul 26, 2018 via email

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good thing to do -- main concern is this is getting towards writing a lowish precision value-set analysis. @peterschrammel @martin-cs do you think any of the current VSAs have the right sort of precision level, to the point that we could parameterise one of those with a predicate expressing "only pay attention to code-typed things, and never bother following pointers" and get a similar result? Possibly value_set_analysis_fit, but I'm not too familiar with how it works?

@@ -90,6 +107,12 @@ class remove_function_pointerst:public messaget
}
};

static irep_idt get_member_identifier(const member_exprt &member)
{
return id2string(member.symbol().get_identifier()) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note member.symbol() throws an invariant if this member_exprt is not ultimately based on a symbol (e.g. member-of-dereference-of... will fail). Therefore you probably need to implement your own recursion here.

/// stored as irep_idts.
/// The map is stored as variable in remove_function_pointerst
/// \param goto_functions goto functions to scan through
void get_all_writes_to_pointers(goto_functionst &goto_functions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gather_all_writes_to_function_pointers (get would usually return something)

@@ -255,6 +281,7 @@ void remove_function_pointerst::fix_return_type(
old_lhs, typecast_exprt(tmp_symbol_expr, old_lhs.type()));
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental change?


if(function_pointer_map.find(identifier) != function_pointer_map.end())
{
status() << "Replacing function pointer from map of possible assigns\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"\n" --> eom

identifier = get_member_identifier(to_member_expr(pointer));
}

if(function_pointer_map.find(identifier) != function_pointer_map.end())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of find followed by [] use a construct like:

auto find_identifier = function_pointer_map.find(identifier);
if(find_identifier != function_pointer_map.end())
{
  functionst &map_entry = find_identifier->second;
  ...


if(lhs.type().id() == ID_pointer && lhs.type().subtype().id() == ID_code)
{
irep_idt lhs_irep;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lhs_identifier?

static irep_idt get_member_identifier(const member_exprt &member)
{
return id2string(member.symbol().get_identifier()) +
id2string(member.get_component_name());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest indexing this map by exprts rather than strings, since it will surely be useful sooner or later to decompose them again


if(rhs.id() == ID_address_of && rhs.type().subtype().id() == ID_code)
{
irep_idt rhs_irep =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function_identifier?

{
irep_idt rhs_irep =
to_symbol_expr(to_address_of_expr(rhs).object()).get_identifier();
if(function_pointer_map.find(lhs_irep) != function_pointer_map.end())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above avoid repeated lookups (find then [] etc) if possible

irep_idt rhs_irep =
to_symbol_expr(to_address_of_expr(rhs).object()).get_identifier();
if(function_pointer_map.find(lhs_irep) != function_pointer_map.end())
(function_pointer_map[lhs_irep]).insert(rhs_irep);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if block currently redundant (function_pointer_map[lhs_irep].insert(rhs_irep) does both cases, no need for the = {{rhs_rep}} special case)

@polgreen
Copy link
Contributor Author

I've tried the default value set analysis but that was just miles and miles off the kind of scalability that I need; happy to try others if someone has suggestions for what might work.

@smowton
Copy link
Contributor

smowton commented Jul 26, 2018

Plain value_set_analysist? If so agreed that's way too heavy. AFAIK the _fi variant is more vague, but I'm not that familiar. Hoping either @peterschrammel or @martin-cs knows more about those cruder-but-quicker versions.

@smowton
Copy link
Contributor

smowton commented Jul 26, 2018

To clarify the level of precision wanted btw, as I understand:

// somewhere
global_ptr = &f;
global_struct.field1 = &h;
global_struct.field2 = &x;

// somewhere else
global_ptr = &g;
global_struct.field1 = &i;
global_struct.field2 = &y;

// still somewhere else
global_struct.field2 = *something->complicated;

// And finally:
something->else.complicated = &abc;

Desired result = "global_ptr could be f or g, global_struct.field1 could be h or i, global_struct.field2 could be anything"? That is, we intentionally ignore the assignment of function abc that could get anywhere?

@martin-cs
Copy link
Collaborator

martin-cs commented Jul 26, 2018 via email

@polgreen
Copy link
Contributor Author

To clarify: basically I do want a low precision value set analysis, that is flow insensitive and that only considers function pointers. As far as I can tell the existing value set analysis variants don't currently do that.

The issue:
Running goto-instrument --remove_function-pointers replaces all function pointers with in some cases 1000 candidate functions. We need to start our analysis somewhere in the middle of our program, i.e., without fully analysing all the initialisation code that may set these function pointers, so this huge case split is a problem for us.

The end goal:
That goto-instrument --remove-function-pointers would be replace function pointers with fewer candidate functions, because we believe that this is a blocking point for us running CBMC on linux/xen.

Level of precision

To that end, the level of precision required is "something better than what currently exists, and that scales well enough to be able to run an analysis on linux/xen".

The precision that I am aiming for, described by @smowton above, we think would give enough of an improvement to make running linux/xen. This would be only for function pointers, I don't care about any other pointers, or any other variables.

@smowton
Copy link
Contributor

smowton commented Jul 26, 2018

*nod
If value-set-analysis-fi was sensible, then it might suffice to parameterise it with a predicate (here set to has-function-pointer-type) to advise it what sorts of things to track. Something like

domaint::apply_assignment(...)
{
  if(!user_interested_in_lhs(assign.lhs()) return;
}

@polgreen
Copy link
Contributor Author

Hmm any insight into how big that "if" is though?

@smowton
Copy link
Contributor

smowton commented Jul 26, 2018

Medium sized. Martin's comment above sounds promising -- it maintains one domain, not one domain per program point, which matches the analysis you wrote for this PR.

Suggested steps to finding out:

  1. Add a show-value-sets-fi option to goto-instrument, which is just like --show-value-sets except it uses value_set_analysis_fi
  2. Hack it to ignore assigns that don't target function pointer typed things by altering src/pointer-analysis/value_set_domain_fi.cpp to check that before calling value_set.apply_code
  3. Run --show-value-sets-fi on Xen and see if it finishes in some acceptable time

@polgreen
Copy link
Contributor Author

I spoke to @kroening and apparently value_set_analaysis and value_set_analysis_fi are deprecated code (or soon to be deprecated) and I should avoid building dependencies on them

@polgreen
Copy link
Contributor Author

I thought I'd try it anyway. So I ran the value_set_analysis_fi on a small linux thing with 332923 effective lines of code (as counted by goto-instrument), and it ran out of memory after 271s

 Maximum resident set size (kbytes): 61120980

The output suggests the flow insensitive value set analysis is building a value set for each function, but the value set appears to be the same for each function. Is it possible that value set is for some reason building the full value set for the entire program, and then repeating that for every function?

The reason I suspect that is because I did successfully run this on Xen for long enough for it to produce the value set for ~10 different functions, and the value set for all functions was identical, and every single function contained the value set for x86_emulate::ops, including "vsnprintf".

If value_set_analysis_fi really is doing this, that would increase the memory consumption quite a lot

@smowton
Copy link
Contributor

smowton commented Jul 27, 2018

That is a bit peculiar. I'll have a mess with it over the weekend. Is the big Xen GBF image available anywhere for testing purposes?

@smowton
Copy link
Contributor

smowton commented Jul 27, 2018

(Dependency wise I'm not sure I agree -- if they're useful then they're useful, and we should patch up their shortcomings rather than throw them away then write largely the same code over again)

@smowton
Copy link
Contributor

smowton commented Jul 27, 2018

(btw was this with limiting to function pointer values? I think that would vastly reduce the amount of information it records, and therefore its memory footprint)

@martin-cs
Copy link
Collaborator

@smowton : w.r.t. to depreciated code, I'd been thinking (time willing) of doing the following:

(*. At some point... work out a good way of allowing ait to be used before function pointer and return removal. This would be generally A Good Thing.)

  1. port value_set_analysist to use ait. This would temporarily loose the ability to run this pre-function pointer removal but we're not actually using it so...

  2. Move the functionality of --show-value-sets from goto-instrument to goto-analyzer.

  3. Fully deprecate static_analysist.

  4. Add a "one domain total" option to ait (this requires a little more infrastructure that is in my next-but-one patch to be merged).

  5. Port value_set_analysis_fi([a-z]*) to use it. Again, this would temporarily loose the ability to run this pre-function pointer removal but the one use-case of this doesn't actually do this so...

  6. Fully deprecate flow_insensitive_analysist.

How does that fit with your plans / what you are supporting?

@martin-cs
Copy link
Collaborator

@polgreen : One option would be to take the flow_insensitive_analysist and implement an instance of
flow_insensitive_abstract_domain_baset . This would give you options and an upgrade path.

@polgreen
Copy link
Contributor Author

@smowton: This was limited to function pointer values, yes; I used track_optionst::TRACK_FUNCTION_POINTERS which appears to then only consider function pointers. I tried adding a manual check for function pointers, but that seems to do exactly the same as simply using TRACK_FUNCTION_POINTERS.

I can't give you the small linux thing to recreate the out-of-memory problem, but it's basically linux. So, if you build linux with noconfig, that would be about the right kind of thing [disclaimer: I haven't tried this, and building the small linux thing took 90 minutes].

Any reasonably large program will be enough to check the duplicate value sets.

The version of goto-cc in develop is currently broken for building Xen.
You can build the xen-syms binary using this branch: https://github.com/nmanthey/xen/tree/gotocc
and this version of goto-cc
https://github.com/tautschnig/cbmc/tree/xen-combined
and using the set up in this docker file:
#2504.

@polgreen
Copy link
Contributor Author

@martin-cs

One option would be to take the flow_insensitive_analysist and implement an instance of
flow_insensitive_abstract_domain_baset . This would give you options and an upgrade path.

So, as I understand:

  • flow_insensitive_abstract_domain_baset is the statet used by flow_insensitive_analysis_baset.
  • flow_insensitive_analysist inherits from flow_insensitive_analyis_baset
  • value_set_analysis_fit inherits from flow_insensitive_analsysist?

I don't then understand why would I want a new instance of flow_insensitive_abstract_domain_baset? The existing set up provides the behaviour that I want, i.e., to only track function pointers by setting track_options in value_set_analysis_fit to be "TRACK_FUNCTION_POINTERS ".

It just has a high memory consumption and I don't see how I could change the memory consumption by changing flow_insensitive_abstract_domain_baset?

@polgreen
Copy link
Contributor Author

@smowton, did you by any chance look at the value_set_analysis_fi?

@smowton
Copy link
Contributor

smowton commented Jul 31, 2018

No time yet sorry. Will try to do it this afternoon.

@smowton
Copy link
Contributor

smowton commented Jul 31, 2018

OK looked over this just now. Conclusions:

  1. flow_insensitive_analysist appears to track exactly one state, not one state per function, so I'm unsure why you were seeing repetitions. Would you mind posting your driver code (the goto-instrument "show-value-sets"-esque code that I assume was written to test this)?
  2. value_set_fit and cousins are pretty involved; it's probably more trouble than its worth bending them fit this use case
  3. Since this can never need to reconsider a particular instruction, it's probably overkill to involve flow_insensitive_analysist at this stage. local_safe_pointerst is in a similar situation, being able to do its job in one pass.

Therefore I withdraw my objection to "yet another abstract interpreter." However this new behaviour probably should be configurable -- presumably there are use cases where the fragment of a program we're looking at doesn't include the function-pointer-assigning glue logic, so we really do want to consider everything type-compatible.

@polgreen
Copy link
Contributor Author

polgreen commented Aug 1, 2018

The code I used to output the value_set_analsysis_fit is

    if(cmdline.isset("show-value-set-fi"))
    {
      namespacet ns(goto_model.symbol_table);
      value_set_analysis_fit value_set(
        ns, value_set_analysis_fit::track_optionst::TRACK_FUNCTION_POINTERS);

      value_set(goto_model.goto_functions);
      value_set.output(goto_model.goto_functions, std::cout);
      return CPROVER_EXIT_SUCCESS;
    }

On closer inspection it looks like the output function just outputs the same state N times, where N is the number of functions in the program then? Any reason why? I wondered if this is just a weird way of forcing function signatures to match for the inheritance, but it doesn't seem to be?

@smowton
Copy link
Contributor

smowton commented Aug 1, 2018

Hah, that is pretty silly. I guess nobody ever used that output function!

To summarise the above btw if it wasn't clear: I'm happy to go ahead with this PR's approach once it's tidied a bit and has some tests -- no need to involve value-set-analysis.

@tautschnig tautschnig self-assigned this Nov 10, 2018
Scans the goto program for assigns to function pointers and builds a map of
assigns to those function pointers. Then uses only these functions to replace
the function pointer with.

If a function pointer is ever written to with anything more complicated than a
straight forward address_of a function, defaults to normal behaviour.

Only works for:
- standalone function pointers
- function pointers that are 1 level deep in a struct

Current unsound-ness to be fixed:
- if a function pointer is part of a struct, and the struct is assigned to in
  total, this is not detected.
@tautschnig
Copy link
Collaborator

@smowton @polgreen I have addressed the feedback, but what's really needed is further benchmarking. I will try to work on this (presumably via #2504).

@tautschnig tautschnig changed the title More precise function pointer removal More precise function pointer removal [depends-on: #2504] Feb 25, 2019
@tautschnig
Copy link
Collaborator

Closing as #5436 ended up as the cleaner solution.

@tautschnig tautschnig closed this Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants