-
Notifications
You must be signed in to change notification settings - Fork 273
More precise function pointer removal [depends-on: #2504] #2620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
goto_functions); | ||
|
||
return rfp.remove_function_pointers(goto_program); | ||
std::cout << "Statistics:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't make mstream work here, and I have no idea why.
It's great that someone is working on this. @NlightNFotis did some work on separating the function pointer removal into "generate candidate lists" and "remove function pointers"; it would be great if that could get picked up / encorporated into this commit as I suspect you are doing something similar. The goal of that work was to get the abstract interpreter to be able to handle programs with function pointers, so that we had a cheap route to finding things like this. It turns out there is one small subtlety that prevents this from working at the moment. If better function pointer removal is something you need then I'd be happy to discuss. |
Hi Martin, would be good to discuss, do you have time for a Skype chat at some point? |
On Thu, 2018-07-26 at 01:31 -0700, E Polgreen wrote:
Hi Martin, would be good to discuss, do you have time for a Skype
chat at some point?
Yes; will organise "off-list".
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good thing to do -- main concern is this is getting towards writing a lowish precision value-set analysis. @peterschrammel @martin-cs do you think any of the current VSAs have the right sort of precision level, to the point that we could parameterise one of those with a predicate expressing "only pay attention to code-typed things, and never bother following pointers" and get a similar result? Possibly value_set_analysis_fit
, but I'm not too familiar with how it works?
@@ -90,6 +107,12 @@ class remove_function_pointerst:public messaget | |||
} | |||
}; | |||
|
|||
static irep_idt get_member_identifier(const member_exprt &member) | |||
{ | |||
return id2string(member.symbol().get_identifier()) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note member.symbol()
throws an invariant if this member_exprt
is not ultimately based on a symbol (e.g. member-of-dereference-of... will fail). Therefore you probably need to implement your own recursion here.
/// stored as irep_idts. | ||
/// The map is stored as variable in remove_function_pointerst | ||
/// \param goto_functions goto functions to scan through | ||
void get_all_writes_to_pointers(goto_functionst &goto_functions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gather_all_writes_to_function_pointers
(get
would usually return something)
@@ -255,6 +281,7 @@ void remove_function_pointerst::fix_return_type( | |||
old_lhs, typecast_exprt(tmp_symbol_expr, old_lhs.type())); | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental change?
|
||
if(function_pointer_map.find(identifier) != function_pointer_map.end()) | ||
{ | ||
status() << "Replacing function pointer from map of possible assigns\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"\n" --> eom
identifier = get_member_identifier(to_member_expr(pointer)); | ||
} | ||
|
||
if(function_pointer_map.find(identifier) != function_pointer_map.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of find
followed by []
use a construct like:
auto find_identifier = function_pointer_map.find(identifier);
if(find_identifier != function_pointer_map.end())
{
functionst &map_entry = find_identifier->second;
...
|
||
if(lhs.type().id() == ID_pointer && lhs.type().subtype().id() == ID_code) | ||
{ | ||
irep_idt lhs_irep; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lhs_identifier
?
static irep_idt get_member_identifier(const member_exprt &member) | ||
{ | ||
return id2string(member.symbol().get_identifier()) + | ||
id2string(member.get_component_name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest indexing this map by exprt
s rather than strings, since it will surely be useful sooner or later to decompose them again
|
||
if(rhs.id() == ID_address_of && rhs.type().subtype().id() == ID_code) | ||
{ | ||
irep_idt rhs_irep = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function_identifier
?
{ | ||
irep_idt rhs_irep = | ||
to_symbol_expr(to_address_of_expr(rhs).object()).get_identifier(); | ||
if(function_pointer_map.find(lhs_irep) != function_pointer_map.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above avoid repeated lookups (find
then []
etc) if possible
irep_idt rhs_irep = | ||
to_symbol_expr(to_address_of_expr(rhs).object()).get_identifier(); | ||
if(function_pointer_map.find(lhs_irep) != function_pointer_map.end()) | ||
(function_pointer_map[lhs_irep]).insert(rhs_irep); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if block currently redundant (function_pointer_map[lhs_irep].insert(rhs_irep)
does both cases, no need for the = {{rhs_rep}}
special case)
I've tried the default value set analysis but that was just miles and miles off the kind of scalability that I need; happy to try others if someone has suggestions for what might work. |
Plain |
To clarify the level of precision wanted btw, as I understand:
Desired result = "global_ptr could be f or g, global_struct.field1 could be h or i, global_struct.field2 could be anything"? That is, we intentionally ignore the assignment of function |
Seems like a good thing to do -- main concern is this is getting
towards writing a lowish precision value-set analysis.
Yes; that's the situation we got to with a major automotive customer.
@thk123 did some great work on getting a number of their special cases
to work (so that ```dispatch_table[3]``` would resolve when
```dispatch_table``` was a global constant) but really, once you are
doing anything that requires understanding the context of the code /
the rest of the program, you are slowly heading towards writing an
abstract interpreter. It would be good if we can avoid having a fourth
abstract interpretation implementation (although I suspect that there
may be more lurking in goto-instrument).
@peterschrammel @martin-cs do you think any of the current VSAs have
the right sort of precision level, to the point that we could
parameterise one of those with a predicate expressing "only pay
attention to code-typed things, and never bother following pointers"
and get a similar result? Possibly `value_set_analysis_fit`, but I'm
not too familiar with how it works?
I would hope so. To recap the situation with abstract interpretation
implementations in CPROVER.
ai.h : ait -- the future, where all of the scalability improvements are
/ will be happening. At the moment this is fixed to be location
sensitive (i.e. one domain per location) but I'm in the process of
merging patches to change this. Designed to be run after function
pointer removal and return removal. It is non-trivial to change this
but at some point we should.
static_analysis.h : static_analysist -- the older framework, deprecated
in favour of ai.h. Uses one domain per location. The one advantage is
that it should be able to handle function pointers and returns. At the
moment this has one user, value_set_analysist, which is only (in the
main branch!) used in goto-instrument in a context where function
pointers have been removed. Is a likely candidate for porting into
ait.
flow_insensitive_analysis.h : flow_insensitive_analysist -- the oldest
of the frameworks I think. Uses one domain, total! Should handle
returns and function pointers -- as much as that is meaningful. Used
for value_set_analysis_{fi,fivr,fivrns}t (depending on if you want
validity regions and sharing). _fi is used by reaching definitions
(and thus dependency graph and thus slicing).
[ None of this should be confused with value_sett, which is how goto-
symex handles pointers and value_setst which is a generic interface for
getting pointers. ]
So, things to try:
1. value_set_analysist without doing function pointer removal. But you
had said this doesn't scale.
2. implement as a flow_insensitive_abstract_domain_baset and use
flow_insensitive_analysist .
3. get involved in merging the context sensitivity and variable
sensitivity domains.
|
To clarify: basically I do want a low precision value set analysis, that is flow insensitive and that only considers function pointers. As far as I can tell the existing value set analysis variants don't currently do that. The issue: The end goal: Level of precision To that end, the level of precision required is "something better than what currently exists, and that scales well enough to be able to run an analysis on linux/xen". The precision that I am aiming for, described by @smowton above, we think would give enough of an improvement to make running linux/xen. This would be only for function pointers, I don't care about any other pointers, or any other variables. |
*nod
|
Hmm any insight into how big that "if" is though? |
Medium sized. Martin's comment above sounds promising -- it maintains one domain, not one domain per program point, which matches the analysis you wrote for this PR. Suggested steps to finding out:
|
I spoke to @kroening and apparently value_set_analaysis and value_set_analysis_fi are deprecated code (or soon to be deprecated) and I should avoid building dependencies on them |
I thought I'd try it anyway. So I ran the value_set_analysis_fi on a small linux thing with 332923 effective lines of code (as counted by goto-instrument), and it ran out of memory after 271s
The output suggests the flow insensitive value set analysis is building a value set for each function, but the value set appears to be the same for each function. Is it possible that value set is for some reason building the full value set for the entire program, and then repeating that for every function? The reason I suspect that is because I did successfully run this on Xen for long enough for it to produce the value set for ~10 different functions, and the value set for all functions was identical, and every single function contained the value set for x86_emulate::ops, including "vsnprintf". If value_set_analysis_fi really is doing this, that would increase the memory consumption quite a lot |
That is a bit peculiar. I'll have a mess with it over the weekend. Is the big Xen GBF image available anywhere for testing purposes? |
(Dependency wise I'm not sure I agree -- if they're useful then they're useful, and we should patch up their shortcomings rather than throw them away then write largely the same code over again) |
(btw was this with limiting to function pointer values? I think that would vastly reduce the amount of information it records, and therefore its memory footprint) |
@smowton : w.r.t. to depreciated code, I'd been thinking (time willing) of doing the following: (*. At some point... work out a good way of allowing ait to be used before function pointer and return removal. This would be generally A Good Thing.)
How does that fit with your plans / what you are supporting? |
@polgreen : One option would be to take the flow_insensitive_analysist and implement an instance of |
@smowton: This was limited to function pointer values, yes; I used track_optionst::TRACK_FUNCTION_POINTERS which appears to then only consider function pointers. I tried adding a manual check for function pointers, but that seems to do exactly the same as simply using TRACK_FUNCTION_POINTERS. I can't give you the small linux thing to recreate the out-of-memory problem, but it's basically linux. So, if you build linux with noconfig, that would be about the right kind of thing [disclaimer: I haven't tried this, and building the small linux thing took 90 minutes]. Any reasonably large program will be enough to check the duplicate value sets. The version of goto-cc in develop is currently broken for building Xen. |
So, as I understand:
I don't then understand why would I want a new instance of It just has a high memory consumption and I don't see how I could change the memory consumption by changing flow_insensitive_abstract_domain_baset? |
@smowton, did you by any chance look at the value_set_analysis_fi? |
No time yet sorry. Will try to do it this afternoon. |
OK looked over this just now. Conclusions:
Therefore I withdraw my objection to "yet another abstract interpreter." However this new behaviour probably should be configurable -- presumably there are use cases where the fragment of a program we're looking at doesn't include the function-pointer-assigning glue logic, so we really do want to consider everything type-compatible. |
The code I used to output the value_set_analsysis_fit is
On closer inspection it looks like the output function just outputs the same state N times, where N is the number of functions in the program then? Any reason why? I wondered if this is just a weird way of forcing function signatures to match for the inheritance, but it doesn't seem to be?
|
Hah, that is pretty silly. I guess nobody ever used that output function! To summarise the above btw if it wasn't clear: I'm happy to go ahead with this PR's approach once it's tidied a bit and has some tests -- no need to involve value-set-analysis. |
Scans the goto program for assigns to function pointers and builds a map of assigns to those function pointers. Then uses only these functions to replace the function pointer with. If a function pointer is ever written to with anything more complicated than a straight forward address_of a function, defaults to normal behaviour. Only works for: - standalone function pointers - function pointers that are 1 level deep in a struct Current unsound-ness to be fixed: - if a function pointer is part of a struct, and the struct is assigned to in total, this is not detected.
9da9979
to
bde2536
Compare
Closing as #5436 ended up as the cleaner solution. |
Currently the function pointer removal replaces each function pointer with a case split between all functions that have the same signature and have had their address taken.
This PR is the beginning of work to refine that. Submitting as PR for early feedback.
This PR:
Currently only works for:
If a function pointer is written to with anything other than a straight forward address of a function, default to original behaviour.
Current unsoundness: