Skip to content

Feature memory-analyzer [depends-on: #2680, #4261] #2649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

mmuesly
Copy link
Contributor

@mmuesly mmuesly commented Jul 31, 2018

The idea of the memory-analyzer, is to create the global state required for some verification harnesses from a core dump file.

In order to do so, you have to feed in a binary with debug symbols and a core file, that matches the binary. And you also have to feed in the static symbols from global state, you care about.

The memory-analyzer will print out c code that is suitable to include it into a test harness in continuation.

This PR won't compile without #2599

It is a clean version of #2648, but it isn't possible to reopen pull request after forced push.

@mmuesly mmuesly force-pushed the feature_memory_analyzer branch 2 times, most recently from 9a1fe28 to 9f70b99 Compare August 1, 2018 15:03
@martin-cs
Copy link
Collaborator

Thanks; this is great stuff! Adding a feature like this might take a little while, just the reviewing, etc. so please bare with us. Are you working with @tautschnig ? I know he had an interest in interfacing with GDB.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally seems promising and it is definitely a feature we want. I wonder if we can make this a lot more functional and portable with some small changes. Also, how does this work with --dump-c ?

@@ -55,14 +55,14 @@ bool c_implicit_typecast_arithmetic(
return !c_typecast.errors.empty();
}

bool is_void_pointer(const typet &type)
bool has_a_void_pointer(const typet &type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what these front-end changes have to do with the description of the patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also introduced another function somewhere in the patch called is_void_pointer but with other semantics.

The new function checks, that type.id is pointer and subtype.id is void. The older front-end function checks that recursively on subtype. Therefore, I thought renaming the old function is the better choice to resolve the name conflict.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK...

memset(buffer, 0, MAX_READ_SIZE_GDB_BUFFER);
}

int gdb_apit::create_gdb_process()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the advantages of spawning a GDB process ourselves vs. connecting to a GDB API? That would allow us to do things like connect to remote machines, connect to other debuggers that implement the same API and to interface with tools like valgrind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some research before writing this code, but haven't found a general GDB API. I am happy to reconsider this code and use other solutions.
It seems, like you have a particular API in mind. Might you provide some more details?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a GDB expert but:

https://sourceware.org/gdb/current/onlinedocs/gdb/Remote-Debugging.html

makes reference to the "standard GDB remote serial protocol", which I think is the one documented here:

https://sourceware.org/gdb/current/onlinedocs/gdb/Remote-Protocol.html

It seemed like this could be useful for things like this:

https://elinux.org/Debugging_The_Linux_Kernel_Using_Gdb

where you could build calling contexts from a running Linux kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the GDB process ourselves keeps this a standalone tool on Linux for now. Connecting to a GDB API requires a network debugger running somewhere. So either we have to take care on setting up the debugger in the network or keep a consistent environment for running the memory analyzer.

I think, that might be on the feature request list, but I don't see, that this is the better solution in general.
Apart from that, I don't want to run a program in the debugger. I mainly want to analyze core dump files at the moment, so there is no benefit to do that in the network as I am not running software inside of gdb at the moment.

I see that this might become desirable for SoC systems. My hope is, that, by separating the gdb_api and the analyzer data structure, given another api later that provides the same functionality, you might exchange the debugger in the background using the same bookkeeping.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but I think we're going to need someone who actually has merge rights for this to comment. @tautschnig ?

{
memset(buffer, 0, last_read_size);
const auto read_size =
read(pipe_output[0], &buffer, MAX_READ_SIZE_GDB_BUFFER);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not personally offended by this kind of thing but it isn't very C++ Would it be possible to use the standard I/O streams to avoid needing to do our own line buffering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is right, but I haven't found a really nice way, to read and write to a child process from C++. Therefore the fork and pipe approach is mainly borrowed from c, but as far as I have seen, there is no standard way, to open a stream buffer on a file descriptor. At least, I haven't found a buffer in I/O stream that opens a file descriptor and searching for this on the internet also hasn't revealed something that works well. For sure there are some Boost libraries doing this for you, but I don't want this dependency.
Instead of reading into this fixed size buffer, I might build some class around the file descriptor and then implement >> and get on them. I would then only use get, as I care about whitespaces.

Sure, I might read only a single byte, so I don't maintain an internal buffer, but I see a performance issue here.

I am open for further suggestions and willed to learn, how to change this, but I don't have another idea at hand. Writing my own class with the methods required, doesn't seem to have a benefit from my point of view instead of using this buffer.

std::string get_memory(const std::string &variable);

private:
static const int MAX_READ_SIZE_GDB_BUFFER = 600;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does worry me a little...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Or what are your concerns?

It is just the chunk size, I read from a pipe. The pipe does the real buffering in the background regarding the interprocess communication.
It is certainly interlinked with your comment regarding the stream processing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed size buffers make me nervous in general but maybe I'm doing too much security analyses.

integer2string(const mp_integer &, unsigned base = DECIMAL_SYSTEM);
const mp_integer
string2integer(const std::string &, unsigned base = DECIMAL_SYSTEM);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; this commit should definitely get merged.

int main()
{
throw "only supported on Linux platforms currently\n";
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the challenges in making this cross platform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GDB is in general not supported out of the box on OSX and Windows. But without GDB, this code doesn't do anything meaningful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another argument ( IMHO ) for connecting to a remote GDB rather than spawning our own.

// size 0.
// It might be desierable to convert this into flags in the future.
config.ansi_c.mode = configt::ansi_ct::flavourt::GCC;
config.ansi_c.set_arch_spec_x86_64();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should hardcode the architecture. Is the usual option of having flags and defaulting to the platform architecture not viable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found code that was supposed to read these settings from the goto-binary, but couldn't make that work with my goto binaries.
Anyhow, these information, if passed as a flag, should effect, how gdb is going to interpret the core dump.
Modifying these gdb settings isn't supported right now by the gdb_apit.

So there is more work required than just changing these hard coded values into flags. To prevent users from passing unsupported flags not aligned with the gdb behavior, I hard coded it for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that it makes sense for now but it would be good to avoid this if we can.

unit/Makefile Outdated
@@ -98,7 +98,8 @@ TESTS = unit_tests$(EXEEXT) \
miniBDD$(EXEEXT) \
# Empty last line

CLEANFILES = $(TESTS)\
CLEANFILES = $(TESTS) \
miniBDD$(OBJEXT) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, this kind of unrelated change should probably be in it's own PR.

@mmuesly
Copy link
Contributor Author

mmuesly commented Aug 6, 2018

@martin-cs Might you provide some more context on the --dump-c comment?
Are you concerned, these changes might break the flag?

Most of the changes regarding c dumping required are in another PR: #2599

@martin-cs
Copy link
Collaborator

@mmuesly --dump-c : as I understand it, the use-case for this is building calling-contexts / verification harnesses. One first things I think people will want to do with these is make some parts non-det. Dumping to C and letting them edit them to their heart's content seems like a good way of doing this, so I was curious to know if the generated code would work with --dump-c.

@mmuesly
Copy link
Contributor Author

mmuesly commented Aug 9, 2018

I don't see, why this shouldn't work together with --dump-c.
The outputted c code compiles. Therefore, as long as the dump-c code is compilable on its own, you should be able to intermingle results by copy and pasting this into one file.

It is not automagically integrated in a c dump at the moment, if that was your request.

@mmuesly
Copy link
Contributor Author

mmuesly commented Aug 9, 2018

I aligned this PR with #2704. Once, this gets merged, it should be possible make CI happy for this PR as well.

@mmuesly mmuesly force-pushed the feature_memory_analyzer branch 6 times, most recently from 5392bb3 to da504b6 Compare August 20, 2018 21:19
@mmuesly
Copy link
Contributor Author

mmuesly commented Aug 21, 2018

I adapted the PR to #2713 and fixed some problems with the Make and Make build system. Might someone provide some insight on the Windows Build error? I am willed to fix them as well, once I have understood the problem.

@smowton
Copy link
Contributor

smowton commented Aug 24, 2018

@mmuesly please rebase rather than do merge-backs (this may be temporary pending a future rebase in which case ignore me :))

Also if you're not already aware #2816 brings a structured symbol table export format.

The CBMC code base normaly checks the id of an irept subtype
against some magic id. But for some tests, it is necessary to check against
more than one id. So this functions are use do the checks.

This should further support code readability as a combination of different ids
gets a name.
Applying CBMC on large code bases
requires sometimes to model a test environment.
Running a program until a certain point and let it
crash, allows to analyze the memory state at this point in time.
In continuation, the memory state might be reconstructed as base for
the test environment model.

By using gdb to analyze the core dump, I don't have to take
care of reading and interpreting the core dump myself.
On multiple locations in the code base,
magic numbers are used to mark the base of an intenger conversion.
This commit introduces constants and
replaces the magic numbers in some places.
This code takes a static symbol, zero initalizes the typet of the symbol
and then fills it up with values if possible.
Following the declared structure in the typet, the analyzer
queries the gdb api for values of the attributes of typet.
It the symbol is a primtive type, the analyzer directly
queries for the value of the symbol.
These test have been used as driving example for the memory analyzer.
They test basic functionality and handling of cycles in structs.
@mmuesly mmuesly force-pushed the feature_memory_analyzer branch from 7eda558 to 255c6f7 Compare August 30, 2018 17:49
To make private methos avilable to unit tests, it is a common practice to redefine the private keyword.
While GCC doesn't carea bout this, older clang versions don't support this behavior and issue a warning.
As all warnings are converted into errors, this waring breaks the build. To allow the intended behavior
during testing, the -Wno-key-macro flag is added.
This commit adds the required files for the CMakeBuild system to build and generate the memory-analyzer
There has been a concurrency issue for one of the Travis CI builds. The ansi-c lib was
not available, when travis tried to build the memory-analyzer binary.
@mmuesly mmuesly force-pushed the feature_memory_analyzer branch from 255c6f7 to 0765557 Compare August 30, 2018 21:25
Older gcc versions only compile c11 if the -std=c11 flag is set. Further this script assumed goto-gcc to be on the path.
This is no longer required.
@tautschnig tautschnig self-assigned this Nov 10, 2018
@danpoe danpoe mentioned this pull request Jan 28, 2019
5 tasks
@tautschnig tautschnig changed the title Feature memory-analyzer Feature memory-analyzer [depends-on: #4261] Feb 25, 2019
@tautschnig
Copy link
Collaborator

I have now marked this as depends-on #4261 just for book keeping purposes - my understanding of #4261 is that @danpoe has picked up all the commits from this PR (or at least so it seemed to me as most commits in #4261 are co-authored by @mmuesly). @danpoe can you confirm?

@tautschnig tautschnig assigned danpoe and unassigned tautschnig Feb 25, 2019
@tautschnig tautschnig changed the title Feature memory-analyzer [depends-on: #4261] Feature memory-analyzer [depends-on: #2680, #4261] Feb 25, 2019
tautschnig added a commit that referenced this pull request May 16, 2019
Memory analyzer to take memory snapshots [blocks: #2649, #4438]
@tautschnig
Copy link
Collaborator

Closing as the patches have been merged in #4261.

@tautschnig tautschnig closed this May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants