-
Notifications
You must be signed in to change notification settings - Fork 273
GOTO language documentation #7471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We already have a documentation PR we just raised as part of our internal efforts to document more of CBMC. It's at #7470 and covers the GOTO IR data structures with some light documentation. |
Awesome! Is it possible to also document the goto file format? |
On Tue, 2023-01-10 at 10:04 -0800, Celina G. Val wrote:
Awesome! Is it possible to also document the goto file format?
I don't want to be negative but I would not recommend it.
GOTO files are a direct serialisation of a bunch of C++ data
structures. This is intentional. They are intended to be "internal"
to the tool, even more so that `.o` files are for compilers. They are
not intended to be an interface point for external tools. If you want
to read / write / manipulate them the intent is that you link the
relevant bits of CPROVER. See also the discussions over cross language
APIs and why JSON Symtab exists.
Why? If we declare that an interface then we fix:
1. The syntax / fields / representation of the data structures.
2. How they are used to represent things.
1. is annoying. They do change from time to time and I don't think
anyone is claiming they are so good that we should freeze them. There
is a good-faith effort to change the version number of GOTO files when
this happens but I don't think anyone tests it so...
2. is more insideous and troubling and why this documentation is so
needed but also hard. How constructs are expressed changes as and when
things need to be fixed. They are still structurally the same but they
use the ireps differently. This is why JSON Symtab based tools tend to
break with an inexplicable error deep inside an entirely unrelated part
of the code.
I would recommend getting the documentation and, more importantly, the
testing and checking of the key data structures really good before you
attempt this.
|
@remi-delmas-3000 I really want to support this effort because it is so very needed. So please read this as "lessons learnt for the language documentation projects 2014, 2015, etc.":
|
@martin-cs thanks for jumping in ! There will be two parallel documentation efforts :
For 2. we envision a document in natural language.
Agreed. We need to understand what is there before attempting to describe it and I'm expecting this to be a bottom-up effort, starting from the implementation.
We mostly want to describe the abstract syntax of GOTO models (what's a goto model, a goto function, a goto program, a goto instruction, what operators are available in the expression language, and their semantics, i.e. small-step transition relation over concrete states). This could be similar to GOTO semantics by translation to FOL for instance. Selecting a concrete syntax could come in a second step.
I thought that defining how things are modelled is mostly the role source-to-GOTO conversion. Do you know how often the meaning basic GOTO operators or instruction types are changed by changing their interpretation in Symex or when translating the SSA format to SAT or SMT ? |
On Wed, 2023-01-11 at 09:08 -0800, Rémi Delmas wrote:
There will be two parallel documentation efforts :
1. (led by Diffblue) document GOTO as seen from inside CBMC for
developers
2. (led by AWS) document GOTO as abstract language for users who
need to :
* produce GOTO from their own source language (they need to
understand the abstract syntax/semantics of GOTO),
How do you envisage them creating these? Which interface are you going
to use?
* would like to write a GOTO interpreter that would be capable of
validating a counter example trace produced by a GOTO verification
tool.
I trust you know about `goto-instrument --interpreter` right? Also, if
you made use of `memory-analyzer/gdb_api.h` and @tautschnig's work on
GOTO programs in ELF format, you could also probably run the actual
binary back-to-back to check the modelling.
For 2. we envision a document in natural language.
I would encourage you to make examples into unit tests and the like.
In the past natural language documentation has not aged well or stayed
tidy.
> 1. is more insideous and troubling and why this documentation is
> so needed but also hard. How constructs are expressed changes as
> and when things need to be fixed. They are still structurally the
> same but they use the ireps differently. This is why JSON Symtab
> based tools tend to break with an inexplicable error deep inside an
> entirely unrelated part of the code.
I thought that defining how things are modelled is mostly the role
source-to-GOTO conversion.
Yes... but... it's not just there. For example, the semantics of
exceptions is really in a transformation pass. The semantics of the
heap is ... kind of scattered, some in the front-end, some in `ansi-
c/library/` some in the symex / interpreter, some in the
instrumentation.
Do you know how often the meaning basic GOTO operators or
instruction types are changed by changing their interpretation in
Symex or when translating the SSA format to SAT or SMT ?
Ummm... that's a bit difficult to answer. It's more how they are
interpreted. For example, in the parse tree an expression can include
arbitrary side-effects, function calls, etc. By the time they are in a
GOTO program, they are side-effect free, have some types removed and
are used in a semantics where variables are mutable. Once they become
VCs they are interpreted with an immutable, logic-based semantics.
Is that what you mean?
Cheers,
- Martin
…--
UCU staff are currently in dispute with employers over pay, pensions
and working conditions. Part of the industrial action is 'action short
of a strike'. This may affect the speed at which you get responses to
e-mails and requests from some staff.
Details here: https://www.ucu.org.uk/article/12469/FAQs
|
Thanks @martin-cs for jumping in. If not goto binaries, which language do you suggest that we use to interface with CBMC? |
@celinval I think goto binaries are a really good way of passing programs between tools. The What I had meant to communicate in my previous response was that I strongly suggest using the CPROVER code for creating / reading / writing these because it insulates you from a number of drift and updating issues. In cases where this is not possible (for example, you are not writing a tool in C++) then it's a bit more awkward. HTH |
Hi @martin-cs, our tool is written in Rust. We do currently use the JSON format and |
@celinval Some thoughts:
I wish I had better answers / clear guidance but I think this is pretty much the problem. HTH. |
What about using something like Protocol Buffers to make B more performant and less platform / language dependent? If we are going to make GOTO programs the proper interface to model programs between tools as identified in #7042, I would hope that we would also create a well defined serialization / deserialization format for it. |
Protocol buffers could work. TBH the big thing that json_symtab needs (IMHO) is a way of making reference to ireps. A lot of the verbosity is many copies of the same strings and expressions. I agree that GOTO programs need either a (widely linkable, cross language) API or (more ideal) a documented format so people can write them. I guess my first point was that I think the current format is best accessed by the API. It may be less pain to produce a new serialisation format (maybe with protocol buffers?) than to document the current one and expect other API implementations to stay in sync. ( Obviously, all of this is easier if it is documented, particular if that is in an executable / testable form. ) One aspect of this that might be a good place to start and try out ideas are the various |
This thread is to collect needs for the GOTO language documentation effort we're starting in 2023.
The goal is to use this thread to scope the documentation and collect questions the documentation is supposed to answer.
The text was updated successfully, but these errors were encountered: