Skip to content

Memory snapshot support for pointers and structs [blocks: #4482] #4438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 16, 2019

Conversation

xbauch
Copy link
Contributor

@xbauch xbauch commented Mar 26, 2019

This PR extends the memory snapshot harness to correctly work with pointers and structs (and unions). This requires two functionalities to cooperate: 1) the memory-analyzer should produce a JSON file containing all information needed for; 2) the goto-harness should be able to read the pointer/struct symbols and correctly instantiate them. Regression tests automate the generation of JSON symbol tables for the harness so that they do not have to be included. The first commit is a squash of the prerequisite PR #4261 (no need to review it here).

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

const symbolt &symbol =
(symbol_table.lookup(snapshot_symbol.base_name) != nullptr)
? snapshot_symbol
: fresh_symbol_copy(snapshot_symbol, symbol_table);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory-analyzer produces a symtab with tmp variables that the malloc(ed) pointers point to. These new variables do not exists in the "local" symbol table. So I add them here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in the code is not so clear on this point, although it mentions it.

I would prefer some text similar to the above specifically mentioning this within the context of memory-analyzer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look sensible enough to me

inline bool is_constant(const typet &type)
{
return type.id() == ID_constant;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes no sense. If this ever were to be true, we'd have messed something up. Maybe return type.get_bool(ID_C_constant);?

inline bool is_signed_or_unsigned_bitvector(const typet &type)
{
return type.id() == ID_signedbv || type.id() == ID_unsignedbv;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now I'd suggest to make this consistent with integer_bitvector_type, doing something like is_integer_bitvector_type.

SRC = analyze_symbol.cpp\
gdb_api.cpp \
memory_analyzer_main.cpp \
memory_analyzer_parse_options.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a mix of tabs and spaces?

@@ -117,6 +117,10 @@ class memory_snapshot_harness_generatort : public goto_harness_generatort
/// \param goto_model: Model where the modification takes place
void add_init_section(goto_modelt &goto_model) const;

const symbolt &fresh_symbol_copy(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tautschnig ok, but the above comments should probably be reflected on the prerequisite PR #4261.

\[main.assertion.3\] line [0-9]+ assertion p != q: SUCCESS
\[main.assertion.4\] line [0-9]+ assertion p == q: SUCCESS
\[main.assertion.3\] line [0-9]+ assertion p \!= q: SUCCESS
\[main.assertion.4\] line [0-9]+ assertion p == q-\>next: SUCCESS
Copy link
Contributor Author

@xbauch xbauch Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated but now all tests are green.

for(size_t i = 0; i < struct_type.components().size(); ++i)
{
const struct_typet::componentt &component = struct_type.components()[i];
const auto &bv_type = to_bitvector_type(component.type());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously won't work if there is a non-bitvector member. But at least it will fail gracefully.

@@ -145,6 +145,11 @@ class symbol_analyzert
const exprt &zero_expr,
const source_locationt &location);

exprt get_pointer_to_member_value(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs documentation, especially w.r.t. to the struct+offset thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

exprt symbol_analyzert::get_pointer_to_member_value(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you actually make use of get_subexpression_at_offset to simplify this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is actually useful for unions as well.

@xbauch xbauch force-pushed the feature/snapshot-pointers branch 2 times, most recently from 8f5bb64 to 945a02b Compare March 28, 2019 15:17
@xbauch xbauch marked this pull request as ready for review March 28, 2019 15:20
Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK for now but can we get #4261 merged first?

{
symbolt &tmp_symbol = get_fresh_aux_symbol(
snapshot_symbol.type,
"",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

xbauch pushed a commit to xbauch/cbmc that referenced this pull request Apr 4, 2019
@xbauch xbauch force-pushed the feature/snapshot-pointers branch 2 times, most recently from 9cdb632 to f92516a Compare April 20, 2019 06:47
@xbauch xbauch requested review from forejtv and thk123 as code owners April 20, 2019 06:47
@xbauch xbauch force-pushed the feature/snapshot-pointers branch 5 times, most recently from 2369a9b to 24587e6 Compare April 21, 2019 07:39
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 2369a9b).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/109067689
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 24587e6).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/109068469
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

xbauch pushed a commit to xbauch/cbmc that referenced this pull request Apr 23, 2019
const auto &value_symbol_expr = value_pair.second;
if(
struct_name ==
id2string(to_symbol_expr(value_symbol_expr).get_identifier()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need id2string here, but I could be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to swap the operands, but yes. Done.

{
const typet target_type = expr.type().subtype();

symbol_exprt dummy(expr.type());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing name (this constructor is deprecated)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


values[memory_location] = new_symbol;

const auto &struct_symbol = values.find(memory_location);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use insert or emplace in the preceding instruction and then pick up the iterator being returned in order to avoid the repeated log-time lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if(maybe_member_expr.has_value())
return *maybe_member_expr;

UNREACHABLE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead use CHECK_RETURN above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

bool gdb_value_extractort::points_to_member(
const pointer_valuet &pointer_value) const
{
if(pointer_value.pointee.find("+") != std::string::npos)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the exact performance difference is, but intuitively I'd use find('+') instead of the string version to avoid any need for computing string lengths or the likes. Worth reviewing the entire PR for, this does occur multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean std::find or something internal? I use it twice so I added a wrapper method into pointer_valuet with a nice name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right here I was thinking about string::find, but I could be wrong about the other cases. With string::find I'd just (when possible) actually use single characters rather than one-character strings.

const symbolt *pointee_symbol = symbol_table.lookup(pointer_value.pointee);
if(pointee_symbol == nullptr)
return false;
const auto pointee_type = pointee_symbol->type;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto &

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pointee_type.id() == ID_array || pointee_type.id() == ID_struct ||
pointee_type.id() == ID_union)
return true;
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just use return pointee_type.id() == ID_struct_tag || ...;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const union_tag_typet &union_tag_type = to_union_tag_type(expr.type());
const union_typet union_type = ns.follow_tag(union_tag_type);

for(size_t i = 0; i < new_expr.operands().size(); ++i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A union expression/union zero-initializer must always have exactly one operand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I check that there is one and skip the loop.

@@ -294,7 +294,9 @@ exprt gdb_value_extractort::get_non_char_pointer_value(
}
else
{
return it->second;
const symbol_exprt typed_symbol_value = symbol_exprt{
to_symbol_expr(it->second).get_identifier(), expr.type().subtype()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly related comment on another commit or other PR (sorry, been reviewing too many PRs in too little time :-) ): can't we just store the symbol with the proper type so that it->second is fully typed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I guess that part of the problem is that we initialise the values map before we know the types of the symbol expressions we store there. I only introduced this change because these initial symbol expressions do not have to have the right type. I guess we could get the right type during initialisation from symbol table but that change should probably take place in #4261.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, not a big deal, was just wondering.

@xbauch xbauch force-pushed the feature/snapshot-pointers branch from 24587e6 to 0ed7b17 Compare May 6, 2019 18:35
@tautschnig tautschnig changed the title Memory snapshot support for pointers and structs [depends-on: #4261] Memory snapshot support for pointers and structs [depends-on: #4261, blocks: #4482] May 7, 2019
xbauch pushed a commit to xbauch/cbmc that referenced this pull request May 7, 2019
tautschnig added a commit that referenced this pull request May 16, 2019
Memory analyzer to take memory snapshots [blocks: #2649, #4438]
@tautschnig tautschnig changed the title Memory snapshot support for pointers and structs [depends-on: #4261, blocks: #4482] Memory snapshot support for pointers and structs [blocks: #4482] May 16, 2019
@xbauch xbauch force-pushed the feature/snapshot-pointers branch 2 times, most recently from 0a98b90 to a68b109 Compare May 16, 2019 14:42
This is needed for example when referring to malloc(ed) variables. These are
build as pointing to `tmp` variables.
Change the memory-analyze to produce correct member expressions. Iterate over
the offset from gdb, e.g. `st+4` means jump to the second member component
inside `st`.
Reuse get_struct_member for unions (copy-paste).
GDB does not know (the CBMC) types but the resulting symbol table requires them.
We only do this for non-nil ireps (nil ireps can result from getting pointer
values for dynamically allocated memory -- will be addressed in later PRs).
Also prevents catch-by-value error when compiled with gcc-8.
but I think is a bit more meaningful now. We do not actually see the value of
`p->next` but at least it is consistent (before `p->next` was different from
`tmp.next`).
These call memory-analyzer to produce the JSON snapshots and then build the
harness similarly to how goto-harness tests work. We currently tests structs,
unions, arrays and pointers to all of those.
@xbauch xbauch force-pushed the feature/snapshot-pointers branch from a68b109 to e961c89 Compare May 16, 2019 15:43
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: a68b109).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/112107536
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: e961c89).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/112111854

@tautschnig tautschnig merged commit d9a5113 into diffblue:develop May 16, 2019
@xbauch xbauch deleted the feature/snapshot-pointers branch June 10, 2019 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants