-
Notifications
You must be signed in to change notification settings - Fork 274
Memory snapshot support for pointers and structs [blocks: #4482] #4438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory snapshot support for pointers and structs [blocks: #4482] #4438
Conversation
const symbolt &symbol = | ||
(symbol_table.lookup(snapshot_symbol.base_name) != nullptr) | ||
? snapshot_symbol | ||
: fresh_symbol_copy(snapshot_symbol, symbol_table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory-analyzer
produces a symtab with tmp
variables that the malloc(ed) pointers point to. These new variables do not exists in the "local" symbol table. So I add them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment in the code is not so clear on this point, although it mentions it.
I would prefer some text similar to the above specifically mentioning this within the context of memory-analyzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look sensible enough to me
src/util/std_types.h
Outdated
inline bool is_constant(const typet &type) | ||
{ | ||
return type.id() == ID_constant; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes no sense. If this ever were to be true, we'd have messed something up. Maybe return type.get_bool(ID_C_constant);
?
src/util/std_types.h
Outdated
inline bool is_signed_or_unsigned_bitvector(const typet &type) | ||
{ | ||
return type.id() == ID_signedbv || type.id() == ID_unsignedbv; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By now I'd suggest to make this consistent with integer_bitvector_type
, doing something like is_integer_bitvector_type
.
src/memory-analyzer/Makefile
Outdated
SRC = analyze_symbol.cpp\ | ||
gdb_api.cpp \ | ||
memory_analyzer_main.cpp \ | ||
memory_analyzer_parse_options.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a mix of tabs and spaces?
@@ -117,6 +117,10 @@ class memory_snapshot_harness_generatort : public goto_harness_generatort | |||
/// \param goto_model: Model where the modification takes place | |||
void add_init_section(goto_modelt &goto_model) const; | |||
|
|||
const symbolt &fresh_symbol_copy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tautschnig ok, but the above comments should probably be reflected on the prerequisite PR #4261.
\[main.assertion.3\] line [0-9]+ assertion p != q: SUCCESS | ||
\[main.assertion.4\] line [0-9]+ assertion p == q: SUCCESS | ||
\[main.assertion.3\] line [0-9]+ assertion p \!= q: SUCCESS | ||
\[main.assertion.4\] line [0-9]+ assertion p == q-\>next: SUCCESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated but now all tests are green.
for(size_t i = 0; i < struct_type.components().size(); ++i) | ||
{ | ||
const struct_typet::componentt &component = struct_type.components()[i]; | ||
const auto &bv_type = to_bitvector_type(component.type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously won't work if there is a non-bitvector member. But at least it will fail gracefully.
@@ -145,6 +145,11 @@ class symbol_analyzert | |||
const exprt &zero_expr, | |||
const source_locationt &location); | |||
|
|||
exprt get_pointer_to_member_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs documentation, especially w.r.t. to the struct+offset thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
|
||
exprt symbol_analyzert::get_pointer_to_member_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you actually make use of get_subexpression_at_offset
to simplify this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is actually useful for unions as well.
8f5bb64
to
945a02b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks OK for now but can we get #4261 merged first?
{ | ||
symbolt &tmp_symbol = get_fresh_aux_symbol( | ||
snapshot_symbol.type, | ||
"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
9cdb632
to
f92516a
Compare
2369a9b
to
24587e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: 2369a9b).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/109067689
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: 24587e6).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/109068469
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
const auto &value_symbol_expr = value_pair.second; | ||
if( | ||
struct_name == | ||
id2string(to_symbol_expr(value_symbol_expr).get_identifier())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need id2string
here, but I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to swap the operands, but yes. Done.
{ | ||
const typet target_type = expr.type().subtype(); | ||
|
||
symbol_exprt dummy(expr.type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing name (this constructor is deprecated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
||
values[memory_location] = new_symbol; | ||
|
||
const auto &struct_symbol = values.find(memory_location); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use insert
or emplace
in the preceding instruction and then pick up the iterator being returned in order to avoid the repeated log-time lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if(maybe_member_expr.has_value()) | ||
return *maybe_member_expr; | ||
|
||
UNREACHABLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead use CHECK_RETURN
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bool gdb_value_extractort::points_to_member( | ||
const pointer_valuet &pointer_value) const | ||
{ | ||
if(pointer_value.pointee.find("+") != std::string::npos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the exact performance difference is, but intuitively I'd use find('+')
instead of the string version to avoid any need for computing string lengths or the likes. Worth reviewing the entire PR for, this does occur multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean std::find
or something internal? I use it twice so I added a wrapper method into pointer_valuet
with a nice name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right here I was thinking about string::find
, but I could be wrong about the other cases. With string::find
I'd just (when possible) actually use single characters rather than one-character strings.
const symbolt *pointee_symbol = symbol_table.lookup(pointer_value.pointee); | ||
if(pointee_symbol == nullptr) | ||
return false; | ||
const auto pointee_type = pointee_symbol->type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto &
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pointee_type.id() == ID_array || pointee_type.id() == ID_struct || | ||
pointee_type.id() == ID_union) | ||
return true; | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just use return pointee_type.id() == ID_struct_tag || ...;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
const union_tag_typet &union_tag_type = to_union_tag_type(expr.type()); | ||
const union_typet union_type = ns.follow_tag(union_tag_type); | ||
|
||
for(size_t i = 0; i < new_expr.operands().size(); ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A union expression/union zero-initializer must always have exactly one operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I check that there is one and skip the loop.
@@ -294,7 +294,9 @@ exprt gdb_value_extractort::get_non_char_pointer_value( | |||
} | |||
else | |||
{ | |||
return it->second; | |||
const symbol_exprt typed_symbol_value = symbol_exprt{ | |||
to_symbol_expr(it->second).get_identifier(), expr.type().subtype()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly related comment on another commit or other PR (sorry, been reviewing too many PRs in too little time :-) ): can't we just store the symbol with the proper type so that it->second
is fully typed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I guess that part of the problem is that we initialise the values
map before we know the types of the symbol expressions we store there. I only introduced this change because these initial symbol expressions do not have to have the right type. I guess we could get the right type during initialisation from symbol table but that change should probably take place in #4261.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, not a big deal, was just wondering.
24587e6
to
0ed7b17
Compare
0a98b90
to
a68b109
Compare
This is needed for example when referring to malloc(ed) variables. These are build as pointing to `tmp` variables.
to pointer_valuet.
Change the memory-analyze to produce correct member expressions. Iterate over the offset from gdb, e.g. `st+4` means jump to the second member component inside `st`.
Reuse get_struct_member for unions (copy-paste).
GDB does not know (the CBMC) types but the resulting symbol table requires them. We only do this for non-nil ireps (nil ireps can result from getting pointer values for dynamically allocated memory -- will be addressed in later PRs). Also prevents catch-by-value error when compiled with gcc-8.
but I think is a bit more meaningful now. We do not actually see the value of `p->next` but at least it is consistent (before `p->next` was different from `tmp.next`).
These call memory-analyzer to produce the JSON snapshots and then build the harness similarly to how goto-harness tests work. We currently tests structs, unions, arrays and pointers to all of those.
a68b109
to
e961c89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR failed Diffblue compatibility checks (cbmc commit: a68b109).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/112107536
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: e961c89).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/112111854
This PR extends the memory snapshot harness to correctly work with pointers and structs (and unions). This requires two functionalities to cooperate: 1) the
memory-analyzer
should produce a JSON file containing all information needed for; 2) thegoto-harness
should be able to read the pointer/struct symbols and correctly instantiate them. Regression tests automate the generation of JSON symbol tables for the harness so that they do not have to be included. The first commit is a squash of the prerequisite PR #4261 (no need to review it here).