Feature memory-analyzer [depends-on: #2680, #4261] #2649

mmuesly · 2018-07-31T22:03:50Z

The idea of the memory-analyzer, is to create the global state required for some verification harnesses from a core dump file.

In order to do so, you have to feed in a binary with debug symbols and a core file, that matches the binary. And you also have to feed in the static symbols from global state, you care about.

The memory-analyzer will print out c code that is suitable to include it into a test harness in continuation.

This PR won't compile without #2599

It is a clean version of #2648, but it isn't possible to reopen pull request after forced push.

martin-cs · 2018-08-02T19:56:58Z

Thanks; this is great stuff! Adding a feature like this might take a little while, just the reviewing, etc. so please bare with us. Are you working with @tautschnig ? I know he had an interest in interfacing with GDB.

martin-cs

Generally seems promising and it is definitely a feature we want. I wonder if we can make this a lot more functional and portable with some small changes. Also, how does this work with --dump-c ?

martin-cs · 2018-08-06T14:23:05Z

src/ansi-c/c_typecast.cpp

@@ -55,14 +55,14 @@ bool c_implicit_typecast_arithmetic(
  return !c_typecast.errors.empty();
 }

-bool is_void_pointer(const typet &type)
+bool has_a_void_pointer(const typet &type)


I'm not sure what these front-end changes have to do with the description of the patch.

I also introduced another function somewhere in the patch called is_void_pointer but with other semantics.

The new function checks, that type.id is pointer and subtype.id is void. The older front-end function checks that recursively on subtype. Therefore, I thought renaming the old function is the better choice to resolve the name conflict.

martin-cs · 2018-08-06T14:28:28Z

src/memory-analyzer/gdb_api.cpp

+  memset(buffer, 0, MAX_READ_SIZE_GDB_BUFFER);
+}
+
+int gdb_apit::create_gdb_process()


What are the advantages of spawning a GDB process ourselves vs. connecting to a GDB API? That would allow us to do things like connect to remote machines, connect to other debuggers that implement the same API and to interface with tools like valgrind.

I have done some research before writing this code, but haven't found a general GDB API. I am happy to reconsider this code and use other solutions.
It seems, like you have a particular API in mind. Might you provide some more details?

I'm not a GDB expert but:

https://sourceware.org/gdb/current/onlinedocs/gdb/Remote-Debugging.html

makes reference to the "standard GDB remote serial protocol", which I think is the one documented here:

https://sourceware.org/gdb/current/onlinedocs/gdb/Remote-Protocol.html

It seemed like this could be useful for things like this:

https://elinux.org/Debugging_The_Linux_Kernel_Using_Gdb

where you could build calling contexts from a running Linux kernel.

Running the GDB process ourselves keeps this a standalone tool on Linux for now. Connecting to a GDB API requires a network debugger running somewhere. So either we have to take care on setting up the debugger in the network or keep a consistent environment for running the memory analyzer.

I think, that might be on the feature request list, but I don't see, that this is the better solution in general.
Apart from that, I don't want to run a program in the debugger. I mainly want to analyze core dump files at the moment, so there is no benefit to do that in the network as I am not running software inside of gdb at the moment.

I see that this might become desirable for SoC systems. My hope is, that, by separating the gdb_api and the analyzer data structure, given another api later that provides the same functionality, you might exchange the debugger in the background using the same bookkeeping.

I see your point but I think we're going to need someone who actually has merge rights for this to comment. @tautschnig ?

martin-cs · 2018-08-06T14:31:26Z

src/memory-analyzer/gdb_api.cpp

+{
+  memset(buffer, 0, last_read_size);
+  const auto read_size =
+    read(pipe_output[0], &buffer, MAX_READ_SIZE_GDB_BUFFER);


I'm not personally offended by this kind of thing but it isn't very C++ Would it be possible to use the standard I/O streams to avoid needing to do our own line buffering?

Yes that is right, but I haven't found a really nice way, to read and write to a child process from C++. Therefore the fork and pipe approach is mainly borrowed from c, but as far as I have seen, there is no standard way, to open a stream buffer on a file descriptor. At least, I haven't found a buffer in I/O stream that opens a file descriptor and searching for this on the internet also hasn't revealed something that works well. For sure there are some Boost libraries doing this for you, but I don't want this dependency.
Instead of reading into this fixed size buffer, I might build some class around the file descriptor and then implement >> and get on them. I would then only use get, as I care about whitespaces.

Sure, I might read only a single byte, so I don't maintain an internal buffer, but I see a performance issue here.

I am open for further suggestions and willed to learn, how to change this, but I don't have another idea at hand. Writing my own class with the methods required, doesn't seem to have a benefit from my point of view instead of using this buffer.

martin-cs · 2018-08-06T14:32:46Z

src/memory-analyzer/gdb_api.h

+  std::string get_memory(const std::string &variable);
+
+private:
+  static const int MAX_READ_SIZE_GDB_BUFFER = 600;


This does worry me a little...

Why? Or what are your concerns?

It is just the chunk size, I read from a pipe. The pipe does the real buffering in the background regarding the interprocess communication.
It is certainly interlinked with your comment regarding the stream processing.

Fixed size buffers make me nervous in general but maybe I'm doing too much security analyses.

martin-cs · 2018-08-06T14:34:06Z

src/util/mp_arith.h

+integer2string(const mp_integer &, unsigned base = DECIMAL_SYSTEM);
+const mp_integer
+string2integer(const std::string &, unsigned base = DECIMAL_SYSTEM);
+


Yes; this commit should definitely get merged.

martin-cs · 2018-08-06T14:39:45Z

src/memory-analyzer/memory_analyzer_main.cpp

+int main()
+{
+  throw "only supported on Linux platforms currently\n";
+}


What are the challenges in making this cross platform?

GDB is in general not supported out of the box on OSX and Windows. But without GDB, this code doesn't do anything meaningful.

This is another argument ( IMHO ) for connecting to a remote GDB rather than spawning our own.

martin-cs · 2018-08-06T14:41:00Z

src/memory-analyzer/memory_analyzer_parse_options.cpp

+  // size 0.
+  // It might be desierable to convert this into flags in the future.
+  config.ansi_c.mode = configt::ansi_ct::flavourt::GCC;
+  config.ansi_c.set_arch_spec_x86_64();


I don't think we should hardcode the architecture. Is the usual option of having flags and defaulting to the platform architecture not viable?

I found code that was supposed to read these settings from the goto-binary, but couldn't make that work with my goto binaries.
Anyhow, these information, if passed as a flag, should effect, how gdb is going to interpret the core dump.
Modifying these gdb settings isn't supported right now by the gdb_apit.

So there is more work required than just changing these hard coded values into flags. To prevent users from passing unsupported flags not aligned with the gdb behavior, I hard coded it for now.

I can see that it makes sense for now but it would be good to avoid this if we can.

martin-cs · 2018-08-06T14:43:24Z

unit/Makefile

@@ -98,7 +98,8 @@ TESTS = unit_tests$(EXEEXT) \
        miniBDD$(EXEEXT) \
        # Empty last line

-CLEANFILES = $(TESTS)\
+CLEANFILES = $(TESTS) \
+        miniBDD$(OBJEXT) \


In future, this kind of unrelated change should probably be in it's own PR.

mmuesly · 2018-08-06T16:12:14Z

@martin-cs Might you provide some more context on the --dump-c comment?
Are you concerned, these changes might break the flag?

Most of the changes regarding c dumping required are in another PR: #2599

martin-cs · 2018-08-07T13:28:28Z

@mmuesly --dump-c : as I understand it, the use-case for this is building calling-contexts / verification harnesses. One first things I think people will want to do with these is make some parts non-det. Dumping to C and letting them edit them to their heart's content seems like a good way of doing this, so I was curious to know if the generated code would work with --dump-c.

mmuesly · 2018-08-09T20:31:52Z

I don't see, why this shouldn't work together with --dump-c.
The outputted c code compiles. Therefore, as long as the dump-c code is compilable on its own, you should be able to intermingle results by copy and pasting this into one file.

It is not automagically integrated in a c dump at the moment, if that was your request.

mmuesly · 2018-08-09T21:50:37Z

I aligned this PR with #2704. Once, this gets merged, it should be possible make CI happy for this PR as well.

mmuesly · 2018-08-21T03:50:31Z

I adapted the PR to #2713 and fixed some problems with the Make and Make build system. Might someone provide some insight on the Windows Build error? I am willed to fix them as well, once I have understood the problem.

smowton · 2018-08-24T10:01:07Z

@mmuesly please rebase rather than do merge-backs (this may be temporary pending a future rebase in which case ignore me :))

Also if you're not already aware #2816 brings a structured symbol table export format.

src/memory-analyzer/Makefile

The CBMC code base normaly checks the id of an irept subtype against some magic id. But for some tests, it is necessary to check against more than one id. So this functions are use do the checks. This should further support code readability as a combination of different ids gets a name.

Applying CBMC on large code bases requires sometimes to model a test environment. Running a program until a certain point and let it crash, allows to analyze the memory state at this point in time. In continuation, the memory state might be reconstructed as base for the test environment model. By using gdb to analyze the core dump, I don't have to take care of reading and interpreting the core dump myself.

On multiple locations in the code base, magic numbers are used to mark the base of an intenger conversion. This commit introduces constants and replaces the magic numbers in some places.

This code takes a static symbol, zero initalizes the typet of the symbol and then fills it up with values if possible. Following the declared structure in the typet, the analyzer queries the gdb api for values of the attributes of typet. It the symbol is a primtive type, the analyzer directly queries for the value of the symbol.

These test have been used as driving example for the memory analyzer. They test basic functionality and handling of cycles in structs.

To make private methos avilable to unit tests, it is a common practice to redefine the private keyword. While GCC doesn't carea bout this, older clang versions don't support this behavior and issue a warning. As all warnings are converted into errors, this waring breaks the build. To allow the intended behavior during testing, the -Wno-key-macro flag is added.

This commit adds the required files for the CMakeBuild system to build and generate the memory-analyzer

There has been a concurrency issue for one of the Travis CI builds. The ansi-c lib was not available, when travis tried to build the memory-analyzer binary.

Older gcc versions only compile c11 if the -std=c11 flag is set. Further this script assumed goto-gcc to be on the path. This is no longer required.

tautschnig · 2019-02-25T00:15:48Z

I have now marked this as depends-on #4261 just for book keeping purposes - my understanding of #4261 is that @danpoe has picked up all the commits from this PR (or at least so it seemed to me as most commits in #4261 are co-authored by @mmuesly). @danpoe can you confirm?

Memory analyzer to take memory snapshots [blocks: #2649, #4438]

tautschnig · 2019-05-16T13:12:10Z

Closing as the patches have been merged in #4261.

mmuesly requested review from chrisr-diffblue, kroening, peterschrammel, pkesseli, smowton and tautschnig as code owners July 31, 2018 22:03

mmuesly force-pushed the feature_memory_analyzer branch 2 times, most recently from 9a1fe28 to 9f70b99 Compare August 1, 2018 15:03

mmuesly mentioned this pull request Aug 3, 2018

Feature filter goto symbol table [blocks: #2649] #2680

Closed

martin-cs requested changes Aug 6, 2018

View reviewed changes

mmuesly force-pushed the feature_memory_analyzer branch 6 times, most recently from 5392bb3 to da504b6 Compare August 20, 2018 21:19

tautschnig reviewed Aug 30, 2018

View reviewed changes

src/memory-analyzer/Makefile Outdated Show resolved Hide resolved

mmuesly added 6 commits August 30, 2018 13:39

Add constants for the integer base system

7f23ea5

On multiple locations in the code base, magic numbers are used to mark the base of an intenger conversion. This commit introduces constants and replaces the magic numbers in some places.

Add regression tests for the memory analyzer

baf7a47

These test have been used as driving example for the memory analyzer. They test basic functionality and handling of cycles in structs.

Add Makefiles enabeling memory-analyzer and tests

cf37266

mmuesly added 6 commits August 30, 2018 13:39

Update unit/Makefile for removing miniBDD.o on clean

d1d76a7

Add meaningfull names instead of id_X

902266c

Add meaningfull names instead of id_X

2a6e041

Update formating of gdb_api.cpp to fit clang rules

ea85cc7

Update to work with diffblue#2713 for clean c code output

9c15079

Removing unecessary blank lines

0784c9e

mmuesly force-pushed the feature_memory_analyzer branch from 7eda558 to 255c6f7 Compare August 30, 2018 17:49

mmuesly added 4 commits August 30, 2018 17:25

Add CMakeBuild config for memory-analyzer

6605fe9

This commit adds the required files for the CMakeBuild system to build and generate the memory-analyzer

Update dependencies for memory-analyzer

718d890

There has been a concurrency issue for one of the Travis CI builds. The ansi-c lib was not available, when travis tried to build the memory-analyzer binary.

Update memory-analyzer/Makefile to use LIBEXT

0765557

mmuesly force-pushed the feature_memory_analyzer branch from 255c6f7 to 0765557 Compare August 30, 2018 21:25

Update compile_example.sh to use realitve path and c11 flag

059561c

Older gcc versions only compile c11 if the -std=c11 flag is set. Further this script assumed goto-gcc to be on the path. This is no longer required.

tautschnig self-assigned this Nov 10, 2018

danpoe mentioned this pull request Jan 28, 2019

GDB API [blocks: #4261] #3983

Merged

5 tasks

tautschnig changed the title ~~Feature memory-analyzer~~ Feature memory-analyzer [depends-on: #4261] Feb 25, 2019

tautschnig added the dependent - do not merge label Feb 25, 2019

tautschnig assigned danpoe and unassigned tautschnig Feb 25, 2019

tautschnig changed the title ~~Feature memory-analyzer [depends-on: #4261]~~ Feature memory-analyzer [depends-on: #2680, #4261] Feb 25, 2019

danpoe mentioned this pull request Apr 23, 2019

Memory analyzer to take memory snapshots [blocks: #2649, #4438] #4261

Merged

6 tasks

tautschnig added a commit that referenced this pull request May 16, 2019

Merge pull request #4261 from danpoe/feature/memory-analyzer

09875c3

Memory analyzer to take memory snapshots [blocks: #2649, #4438]

tautschnig closed this May 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature memory-analyzer [depends-on: #2680, #4261] #2649

Feature memory-analyzer [depends-on: #2680, #4261] #2649

mmuesly commented Jul 31, 2018 •

edited

Loading

martin-cs commented Aug 2, 2018

martin-cs left a comment

martin-cs Aug 6, 2018

mmuesly Aug 6, 2018

martin-cs Aug 7, 2018

martin-cs Aug 6, 2018

mmuesly Aug 6, 2018

martin-cs Aug 7, 2018

mmuesly Aug 7, 2018

martin-cs Aug 8, 2018

martin-cs Aug 6, 2018

mmuesly Aug 9, 2018

martin-cs Aug 6, 2018

mmuesly Aug 6, 2018

martin-cs Aug 7, 2018

martin-cs Aug 6, 2018

martin-cs Aug 6, 2018

mmuesly Aug 6, 2018

martin-cs Aug 7, 2018

martin-cs Aug 6, 2018

mmuesly Aug 6, 2018

martin-cs Aug 7, 2018

martin-cs Aug 6, 2018

mmuesly commented Aug 6, 2018

martin-cs commented Aug 7, 2018

mmuesly commented Aug 9, 2018

mmuesly commented Aug 9, 2018

mmuesly commented Aug 21, 2018 •

edited

Loading

smowton commented Aug 24, 2018

tautschnig commented Feb 25, 2019

tautschnig commented May 16, 2019

Feature memory-analyzer [depends-on: #2680, #4261] #2649

Feature memory-analyzer [depends-on: #2680, #4261] #2649

Conversation

mmuesly commented Jul 31, 2018 • edited Loading

martin-cs commented Aug 2, 2018

martin-cs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmuesly commented Aug 6, 2018

martin-cs commented Aug 7, 2018

mmuesly commented Aug 9, 2018

mmuesly commented Aug 9, 2018

mmuesly commented Aug 21, 2018 • edited Loading

smowton commented Aug 24, 2018

tautschnig commented Feb 25, 2019

tautschnig commented May 16, 2019

mmuesly commented Jul 31, 2018 •

edited

Loading

mmuesly commented Aug 21, 2018 •

edited

Loading