Skip to content

Dynamic memory snapshot #4578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 30, 2019
Merged

Conversation

xbauch
Copy link
Contributor

@xbauch xbauch commented Apr 27, 2019

Enable tracking information about dynamically allocated memory from memory-analyzer across to goto-harness via memory snapshots. The knowledge about what addresses have been allocated dynamically (and the scope of these allocation in byte size) is collected by breaking the execution at malloc call-sites and inspecting the registers of these function calls.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me (with the disclaimer that I don't really understand this)

bool gdb_value_extractort::memory_scopet::contains(
const memory_addresst &point) const
{
size_t begin_int = std::strtoul(begin.address_string.c_str(), NULL, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a unchecked conversion effectively (size_t is not necessarily the same size as unsigned long, even though it's often the case). I'd recommend using safe_string2size_t from string2int.h (same with all the other string to int conversions here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if the safe-conversion would work on hex-numbers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should, they use stoll which basically works the same way as the str* functions except it's easier to detect failures.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and refactored so that the conversion is computed once in case of distance. See: 3812c28.

@@ -91,6 +91,70 @@ class gdb_value_extractort
/// value of `symbol`.
std::map<memory_addresst, exprt> values;

struct memory_scopet
{
memory_addresst begin;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not private, if these are not arbitrary (as the deleted default constructor implies)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, also added getters when necessary.

optionalt<mp_integer> get_malloc_size(irep_idt name);

/// Build the pointee string for address \p point assuming it points to a
/// dynamic allocation of `n' elements each of size \p member_size. E.g.:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's n here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T* point = (T*) malloc(sizeof(T) * n); .. that n.

@@ -230,8 +242,11 @@ code_blockt memory_snapshot_harness_generatort::add_assignments_to_globals(
ordered_snapshot_symbols.begin(),
ordered_snapshot_symbols.end(),
[this](const snapshot_pairt &left, const snapshot_pairt &right) {
return pointer_depth(left.second.symbol_expr().type()) <
pointer_depth(right.second.symbol_expr().type());
if(refers_to(right.second.value, left.first))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, say int* p = malloc(8); int *q = p+1; is the source code for which we want to take the memory snapshot. Then the snapshot will contain: int tmp[2]; int* p = tmp[0]; int* q = tmp[1]; And both p and q refer_to tmp. I sort the symbols based on this notion of reference so that when building the harness entry function, tmp is assigned before the two pointers.

const std::string pointee;
const std::string character;
const optionalt<std::string> string;
memory_addresst address;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields are all related, so I don't think they should be both be constant and public at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well they are not const in the final version.

{
if(pointer_value.pointee.find("+") != std::string::npos)
return true;

if(pointer_value.pointee.empty())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this only done if the pointee is empty? What does that mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well pointer_value is the result of the gdb query get_memory. When the pointee is empty it means that gdb did not know what structure the pointer points to. This can happen if the pointer points to inside a dynamically allocated memory.

bool gdb_value_extractort::memory_scopet::contains(
const memory_addresst &point) const
{
size_t begin_int = std::strtoul(begin.address_string.c_str(), NULL, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

memory_scopet() = delete;
memory_scopet(
const memory_addresst &begin,
mp_integer byte_size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const mp_integer &byte_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

memory_scopet(
const memory_addresst &begin,
mp_integer byte_size,
irep_idt name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const irep_idt &name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

CHECK_RETURN(contains(point));
size_t begin_int = std::strtoul(begin.address_string.c_str(), NULL, 0);
size_t point_int = std::strtoul(point.address_string.c_str(), NULL, 0);
return (point_int - begin_int) / member_size;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you first do CHECK_RETURN(point_int >= begin_int);?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was implied by contains. But I refactored this section a bit to avoid duplicate computation of the address-to-string conversion. See: 3812c28.

mp_integer byte_size;
irep_idt name;

memory_scopet() = delete;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's necessary if you explicitly define some other constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Removed.

}

const auto it = values.find(memory_location);
// if the structure we are pointing to does not exists we need to build a
// temporary object for it: get the type from symbol table, query gdb for
// value, allocate new object for it and then store into assignments
if(it == values.end())
if(struct_symbol->symbol_expr().type().id() == ID_pointer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very weird?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about struct_symbol->type.id() ? Changed.

@@ -303,6 +303,7 @@ exprt gdb_value_extractort::get_pointer_to_member_value(
}

if(!found)
if(struct_symbol->symbol_expr().type().id() == ID_array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks weird!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

@@ -291,6 +291,8 @@ exprt gdb_value_extractort::get_pointer_to_member_value(
bool found = false;
CHECK_RETURN(maybe_struct_size.has_value());
for(const auto &value_pair : values)

if(memory_map.count(struct_name) == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped in a helper function has_known_memory_location.

@@ -0,0 +1,13 @@
FUTURE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain (after another -- at the end of the file) what isn't supported here so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

tmp = \{ \.next=\(\(struct S \*\)0\) \};
p = \&tmp;
st = \{ .next=\&st \};
p = \&st;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please squash this into the commit that made this possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was this one: 33d0a79. I will squash them together.

@xbauch xbauch force-pushed the feature/dynamic-memory-snapshot branch from d803707 to 3812c28 Compare May 8, 2019 09:22
@xbauch xbauch changed the title Dynamic memory snapshot [depends-on: #4482] Dynamic memory snapshot May 20, 2019
@xbauch xbauch force-pushed the feature/dynamic-memory-snapshot branch from 3812c28 to 71ee2f1 Compare May 20, 2019 13:35
@xbauch xbauch force-pushed the feature/dynamic-memory-snapshot branch 7 times, most recently from 785cf3a to 410fd84 Compare May 21, 2019 18:41
@tautschnig
Copy link
Collaborator

@xbauch My apologies for taking this long! You might want to squash some commits and then just merge. Thank you for all the work!!!

@tautschnig tautschnig assigned xbauch and unassigned tautschnig May 29, 2019
@xbauch xbauch force-pushed the feature/dynamic-memory-snapshot branch 3 times, most recently from 89f8805 to 5453ba7 Compare May 30, 2019 05:45
xbauch pushed a commit to xbauch/cbmc that referenced this pull request May 30, 2019
@tautschnig
Copy link
Collaborator

@xbauch Please rebase now that #4729 is in to make CI happy.

petr-bauch added 17 commits May 30, 2019 09:03
In memory-snapshot-harness: so that pointer to objects are initialised after
those objects.
no functional change
So that we can assume the implicit assignment operator.
One for value, one for register-value.
by breaking the execution at malloc entry and extracting the byte size and
memory address from registers.
by returning the exact value rather than an estimate collected during preceding
execution.
memory_scopet keeps the starting point and allocated size for a malloc(ed) site.
We also include helper methods to query about dynamic allocation.
To account for the possibility of the pointee to be dynamically allocated.
into a member method.
Unnecessary now that we enforce analysing those objects.
1: If pointee is known, return it's value instead.
2: malloc-size is precise and type-size has a wrapper
3: do not cast values from dynamically allocated pointers: we would loose the
   information about them being arrays.
1: Use memory map to get pointer-values (but double-check if gdb previously
   claimed that their pointees are themselves)
2: add more checks before casting.
instead of a vector to prevent duplications.
1: Casts are a part of the output
2: We can use some of the assertions previously left out (now that we get the
dynamic allocation right).
Floating point numbers cannot be extracted for now.
rather than as a part of regression testing and only under non-MSVC environment.
@xbauch xbauch force-pushed the feature/dynamic-memory-snapshot branch from 2ab16c4 to ba8beb5 Compare May 30, 2019 08:04
@xbauch xbauch merged commit 97f1619 into diffblue:develop May 30, 2019
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: ba8beb5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/113721650

@xbauch xbauch deleted the feature/dynamic-memory-snapshot branch June 10, 2019 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants