-
Notifications
You must be signed in to change notification settings - Fork 273
Feature memory-analyzer [depends-on: #2680, #4261] #2649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9a1fe28
to
9f70b99
Compare
Thanks; this is great stuff! Adding a feature like this might take a little while, just the reviewing, etc. so please bare with us. Are you working with @tautschnig ? I know he had an interest in interfacing with GDB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally seems promising and it is definitely a feature we want. I wonder if we can make this a lot more functional and portable with some small changes. Also, how does this work with --dump-c ?
@@ -55,14 +55,14 @@ bool c_implicit_typecast_arithmetic( | |||
return !c_typecast.errors.empty(); | |||
} | |||
|
|||
bool is_void_pointer(const typet &type) | |||
bool has_a_void_pointer(const typet &type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what these front-end changes have to do with the description of the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also introduced another function somewhere in the patch called is_void_pointer
but with other semantics.
The new function checks, that type.id is pointer and subtype.id is void. The older front-end function checks that recursively on subtype. Therefore, I thought renaming the old function is the better choice to resolve the name conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK...
memset(buffer, 0, MAX_READ_SIZE_GDB_BUFFER); | ||
} | ||
|
||
int gdb_apit::create_gdb_process() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the advantages of spawning a GDB process ourselves vs. connecting to a GDB API? That would allow us to do things like connect to remote machines, connect to other debuggers that implement the same API and to interface with tools like valgrind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done some research before writing this code, but haven't found a general GDB API. I am happy to reconsider this code and use other solutions.
It seems, like you have a particular API in mind. Might you provide some more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a GDB expert but:
https://sourceware.org/gdb/current/onlinedocs/gdb/Remote-Debugging.html
makes reference to the "standard GDB remote serial protocol", which I think is the one documented here:
https://sourceware.org/gdb/current/onlinedocs/gdb/Remote-Protocol.html
It seemed like this could be useful for things like this:
https://elinux.org/Debugging_The_Linux_Kernel_Using_Gdb
where you could build calling contexts from a running Linux kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the GDB process ourselves keeps this a standalone tool on Linux for now. Connecting to a GDB API requires a network debugger running somewhere. So either we have to take care on setting up the debugger in the network or keep a consistent environment for running the memory analyzer.
I think, that might be on the feature request list, but I don't see, that this is the better solution in general.
Apart from that, I don't want to run a program in the debugger. I mainly want to analyze core dump files at the moment, so there is no benefit to do that in the network as I am not running software inside of gdb at the moment.
I see that this might become desirable for SoC systems. My hope is, that, by separating the gdb_api and the analyzer data structure, given another api later that provides the same functionality, you might exchange the debugger in the background using the same bookkeeping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point but I think we're going to need someone who actually has merge rights for this to comment. @tautschnig ?
{ | ||
memset(buffer, 0, last_read_size); | ||
const auto read_size = | ||
read(pipe_output[0], &buffer, MAX_READ_SIZE_GDB_BUFFER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not personally offended by this kind of thing but it isn't very C++ Would it be possible to use the standard I/O streams to avoid needing to do our own line buffering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is right, but I haven't found a really nice way, to read and write to a child process from C++. Therefore the fork and pipe approach is mainly borrowed from c, but as far as I have seen, there is no standard way, to open a stream buffer on a file descriptor. At least, I haven't found a buffer in I/O stream that opens a file descriptor and searching for this on the internet also hasn't revealed something that works well. For sure there are some Boost libraries doing this for you, but I don't want this dependency.
Instead of reading into this fixed size buffer, I might build some class around the file descriptor and then implement >>
and get
on them. I would then only use get
, as I care about whitespaces.
Sure, I might read only a single byte, so I don't maintain an internal buffer, but I see a performance issue here.
I am open for further suggestions and willed to learn, how to change this, but I don't have another idea at hand. Writing my own class with the methods required, doesn't seem to have a benefit from my point of view instead of using this buffer.
std::string get_memory(const std::string &variable); | ||
|
||
private: | ||
static const int MAX_READ_SIZE_GDB_BUFFER = 600; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does worry me a little...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Or what are your concerns?
It is just the chunk size, I read from a pipe. The pipe does the real buffering in the background regarding the interprocess communication.
It is certainly interlinked with your comment regarding the stream processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed size buffers make me nervous in general but maybe I'm doing too much security analyses.
integer2string(const mp_integer &, unsigned base = DECIMAL_SYSTEM); | ||
const mp_integer | ||
string2integer(const std::string &, unsigned base = DECIMAL_SYSTEM); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; this commit should definitely get merged.
int main() | ||
{ | ||
throw "only supported on Linux platforms currently\n"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the challenges in making this cross platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GDB is in general not supported out of the box on OSX and Windows. But without GDB, this code doesn't do anything meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another argument ( IMHO ) for connecting to a remote GDB rather than spawning our own.
// size 0. | ||
// It might be desierable to convert this into flags in the future. | ||
config.ansi_c.mode = configt::ansi_ct::flavourt::GCC; | ||
config.ansi_c.set_arch_spec_x86_64(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should hardcode the architecture. Is the usual option of having flags and defaulting to the platform architecture not viable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found code that was supposed to read these settings from the goto-binary, but couldn't make that work with my goto binaries.
Anyhow, these information, if passed as a flag, should effect, how gdb is going to interpret the core dump.
Modifying these gdb settings isn't supported right now by the gdb_apit
.
So there is more work required than just changing these hard coded values into flags. To prevent users from passing unsupported flags not aligned with the gdb behavior, I hard coded it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that it makes sense for now but it would be good to avoid this if we can.
unit/Makefile
Outdated
@@ -98,7 +98,8 @@ TESTS = unit_tests$(EXEEXT) \ | |||
miniBDD$(EXEEXT) \ | |||
# Empty last line | |||
|
|||
CLEANFILES = $(TESTS)\ | |||
CLEANFILES = $(TESTS) \ | |||
miniBDD$(OBJEXT) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future, this kind of unrelated change should probably be in it's own PR.
@martin-cs Might you provide some more context on the Most of the changes regarding c dumping required are in another PR: #2599 |
@mmuesly |
I don't see, why this shouldn't work together with --dump-c. It is not automagically integrated in a c dump at the moment, if that was your request. |
I aligned this PR with #2704. Once, this gets merged, it should be possible make CI happy for this PR as well. |
5392bb3
to
da504b6
Compare
I adapted the PR to #2713 and fixed some problems with the Make and Make build system. Might someone provide some insight on the Windows Build error? I am willed to fix them as well, once I have understood the problem. |
The CBMC code base normaly checks the id of an irept subtype against some magic id. But for some tests, it is necessary to check against more than one id. So this functions are use do the checks. This should further support code readability as a combination of different ids gets a name.
Applying CBMC on large code bases requires sometimes to model a test environment. Running a program until a certain point and let it crash, allows to analyze the memory state at this point in time. In continuation, the memory state might be reconstructed as base for the test environment model. By using gdb to analyze the core dump, I don't have to take care of reading and interpreting the core dump myself.
On multiple locations in the code base, magic numbers are used to mark the base of an intenger conversion. This commit introduces constants and replaces the magic numbers in some places.
This code takes a static symbol, zero initalizes the typet of the symbol and then fills it up with values if possible. Following the declared structure in the typet, the analyzer queries the gdb api for values of the attributes of typet. It the symbol is a primtive type, the analyzer directly queries for the value of the symbol.
These test have been used as driving example for the memory analyzer. They test basic functionality and handling of cycles in structs.
7eda558
to
255c6f7
Compare
To make private methos avilable to unit tests, it is a common practice to redefine the private keyword. While GCC doesn't carea bout this, older clang versions don't support this behavior and issue a warning. As all warnings are converted into errors, this waring breaks the build. To allow the intended behavior during testing, the -Wno-key-macro flag is added.
This commit adds the required files for the CMakeBuild system to build and generate the memory-analyzer
There has been a concurrency issue for one of the Travis CI builds. The ansi-c lib was not available, when travis tried to build the memory-analyzer binary.
255c6f7
to
0765557
Compare
Older gcc versions only compile c11 if the -std=c11 flag is set. Further this script assumed goto-gcc to be on the path. This is no longer required.
Closing as the patches have been merged in #4261. |
The idea of the memory-analyzer, is to create the global state required for some verification harnesses from a core dump file.
In order to do so, you have to feed in a binary with debug symbols and a core file, that matches the binary. And you also have to feed in the static symbols from global state, you care about.
The memory-analyzer will print out c code that is suitable to include it into a test harness in continuation.
This PR won't compile without #2599
It is a clean version of #2648, but it isn't possible to reopen pull request after forced push.