Skip to content

Add symtab2gb to enable linking of .json_symtab files #4645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hannes-steffenhagen-diffblue
Copy link
Contributor

Right now we have the goto-cc tool to create goto binaries from C
files and link goto binaries together. This adds a similar type of
"linker" tool for the symtab language.

I had considered extending 'goto-cc' to handle symtab files as well,
however goto-cc (quite reasonably) makes some C-specific assumptions
about its input files, and I'd figured rather than working around that
it'd be easier to just have a simple command line tool to invoke
goto_convert and link_goto_model (which, in the end, is all we want
to do for json_symtabs).

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

There's no real need to look at the .json_symtab files in the tests, they're just big generated files containing symbol tables.

Copy link
Contributor

@xbauch xbauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@martin-cs
Copy link
Collaborator

@hannes-steffenhagen-diffblue : can you elaborate a bit on why this can't be part of goto-cc? It seems that a general for / against is something like:

  • Where possible we want to stop the proliferation of different tools and we already have a tool which is front-end producing goto-binaries.
  • We may want to be able to do Ada specific linking things, particularly around the questions of elaboration order and set-up.
  • Other languages that use .json_symtab may also need to do custom linking things.

I would particularly like to hear from @tautschnig as the author / warden of the linker.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@martin-cs regarding Ada, I think we can get away without any special elaboration support during linking, but that's not what I mean; The problem with goto-cc is that it assumes main is the entry point to the program (which isn't true in Ada, main is just a normal identifier there), and it also replaces any __CPROVER_start which is already present with the one from ansi_c_entry_point.

I thought it'd be useful to have something that just does a straight forward combining of symbol tables, and I'm not sure if that's really compatible with goto-cc which is supposed to act like a C compiler.

@martin-cs
Copy link
Collaborator

@hannes-steffenhagen-diffblue I am not sure if it is compatible either but I think it is a discussion worth having. If I understand the points you raise correctly there are:

A. where should we do language-specific linking things?
A.1. how and by whom is the entry point specified?
A.2. what do we do about set-up code and how does that get through linking?

B. where should linking be a stand-alone tool? In goto-cc? In goto-instrument?

I think in the next few weeks and months there are likely to be Ada-specific linking things to be done (we may need to support control over elaboration orders for example), so I am not exactly clear on what the best route forward is.

I would very much appreciate thoughts from others (including, ofcourse, @hannes-steffenhagen-diffblue )

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards keeping such a simple linking tool as a separate binary, because goto-cc primarily deals with the various command-line options of existing compilers, none of which is of interest here. But then I'm wondering whether the binary could just go in the existing json-symtab-language folder?

@@ -0,0 +1,10 @@
CORE
entry_point.json_symtab
user.json_symtab library.json_symtab
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a readme file or some other explanation describing how the symtab files were generated?

@@ -104,3 +104,5 @@ add_subdirectory(goto-instrument)
add_subdirectory(goto-analyzer)
add_subdirectory(goto-diff)
add_subdirectory(goto-harness)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary blank line?

src/Makefile Outdated
@@ -21,6 +21,7 @@ DIRS = analyses \
solvers \
util \
xmllang \
symtab2gb \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort lexicographically.

@@ -0,0 +1,113 @@
#include <fstream>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright header

throw invalid_source_file_exceptiont{
"failed to typecheck symbol table from file `" + symtab_filename + "'"};
}
auto goto_model = goto_modelt{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer goto_modelt goto_model;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the auto form because with that I can make all declarations look the same, but I don't mind changing it if it bothers you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does bother me, but I'm just one voice here. If others like this better then there's no need to fight about it. (I actually don't think that all declarations looking the same is desirable - quite often I don't even care much about the name, just about the type.)

throw invalid_source_file_exceptiont{
"failed to parse symbol table from file `" + symtab_filename + "'"};
}
auto symtab = symbol_tablet{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symbol_tablet symtab;

if(i + 1 >= argc)
{
throw invalid_command_line_argument_exceptiont{
"expected output file name", "--out"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, this is a bit too simplistic to my taste: I think this should have support for --help and --version at bare minimum. Therefore I'd recommend inheriting from parse_options_baset.

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@tautschnig I believe I've addressed your review comments

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discussion with @chrisr-diffblue I think this is probably the best architecture for now.


int main(int argc, const char *argv[])
{
symtab2gb_parse_optionst parse_options{argc, argv};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for { } rather than ( ) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brace initialisation avoids parsing ambiguities and prevents implicit conversions, and is recommended "by default" by the CPP Core Guidelines. Unless we have a good reason my preference would be to stick to those as the 'default'.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-cs Preferred since #3546

std::ofstream out_file{gb_filename};
if(!out_file.is_open())
{
throw system_exceptiont{"couldn't open output file `" + gb_filename + "'"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

std::ifstream symtab_file{symtab_filename};
if(!symtab_file.is_open())
{
throw system_exceptiont{"couldn't open input file `" + symtab_filename +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: dc8b83c).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/111644348

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

Windows tests seem to fail the goto binaries that were written aren't readable? Honestly no idea what's going on there

@tautschnig
Copy link
Collaborator

Windows tests seem to fail the goto binaries that were written aren't readable? Honestly no idea what's going on there

I looked at the code again and nothing obvious springs to my mind. I can try to play with this in a Windows environment later this week.

show:
@for dir in *; do \
if [ -d "$$dir" ]; then \
vim -o "$$dir/*.c" "$$dir/*.out"; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$VISUAL or $EDITOR, rather than "vim" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -o option is specific to vim, and it's the same bit of code we have everywhere else. But I don't know whether anyone ever users this target?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was the first time I've actually noticed this target - never used it myself!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on never using it. It's honestly a bit of cargo culting, I was kind of assuming someone around here liked to work that way. But I certainly don't mind removing it, I have no attachment to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for having a separate PR that removes all of those. If someone is still using them, we can then have that discussion on the PR.

{
// try opening all the files first to make sure we can
// even read/write what we want
std::ofstream out_file{gb_filename};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your windows CI issue, does windows still care about the difference between binary and ascii file modes, and if so, what does std::ofstream do by default? I'm wondering if you need to put the stream into a binary mode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/goto-cc/compiler.cpp does this: std::ofstream outfile(file_name, std::ios::binary); so I suspect you might need the binary mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you need to explicitly close() the output file as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary yes, I had not considered that at all but now that you mention it that does seem to be the kind of thing that'd cause this problem.

close I don't think so, the std streams are all RAII controlled, so they should close even on exception.

I can put the input files in a smaller scope so they get closed before the actual writing starts, although I'm not sure how much of a benefit that's going to be (I'm opening all files at the start to be as-sure-as-reasonably-possible-in-advance that our IO operations are going to succeed, so we don't go half way through the linking bit only to find one of the files is misspelled or we don't have r/w access or something like that).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you actually add a precondition to write_goto_binary that tests that the stream is open in binary mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tautschnig AFAIK iostreams don't expose a way to query or change the openmode

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 10d7a51).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/111720442

Right now we have the `goto-cc` tool to create goto binaries from C
files and link goto binaries together. This adds a similar type of
"linker" tool for the symtab language.

I had considered extending 'goto-cc' to handle symtab files as well,
however goto-cc (quite reasonably) makes some C-specific assumptions
about its input files, and I'd figured rather than working around that
it'd be easier to just have a simple command line tool to invoke
`goto_convert` and `link_goto_model` (which, in the end, is all we want
to do for json_symtabs).
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: da5fd45).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/111772089

@tautschnig tautschnig merged commit 81c3e02 into diffblue:develop May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants