-
Notifications
You must be signed in to change notification settings - Fork 273
Added concept of core goto language form #7505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added concept of core goto language form #7505
Conversation
Codecov ReportBase: 78.48% // Head: 78.48% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #7505 +/- ##
========================================
Coverage 78.48% 78.48%
========================================
Files 1663 1663
Lines 191249 191258 +9
========================================
+ Hits 150103 150112 +9
Misses 41146 41146
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, please, please can we have the checks of canonical form to be in code not in documentation? As part of --validate-goto-model
would be 🌟magical🌟. See
#7471 (comment) for rationale on the lifespan of this documentation.
Also note there is not one canonical form, see #6495 for a suggested way of actually checking an enforcing the various properties that a program might have.
doc/ADR/canonical_goto.md
Outdated
by any solver deriving from `goto_verifiert` by using the `goto_verifiert::operator()` function | ||
without requiring any lowering step. | ||
|
||
At the time of writing verifiers extending `goto_verifiert`, so defining what a **canonical goto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't parse this sentence. Also, this definition of canonical form doesn't really help me much. I'd rather have
a description than defining it implicitly as whatever these verifiers can process. If I wanted to write something of my own that took the canonical form this wouldn't help me much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a high-level, declarative specification of what GOTO is and of the rules checked by the verifiers would solve the underlying problem of helping people build a mental model of canonical GOTO independently of the C++ that implements the concept.
Having a high-level, declarative specification of what GOTO is and of
the rules checked by the verifiers would solve the underlying problem
of helping people build a mental model of canonical GOTO
independently of the C++ that implements the concept.
Agreed. It would be great to have a formal semantics. What I am
saying is that in my experience of this project, text descriptions that
have no computational checking / testing / aspect / whatever tend to be
incomplete and diverge from the code. A literate programming
implementation of `goto_programt::instructiont::validate` and
`validate_goto_model` would be a great way to solve this, give example
code of how to access everything and would be hard to not update.
If someone can get me the resource / time to do this I will even do it
myself!
|
Here is another example of why "normal forms" are tricky. Is a "back edge" a literal, code order back edge or a topological back edge : #7506 |
Hi, I agree with almost all that you raised. However this PR is not trying to give a formal structural definition of what canonical goto form is. The aim of this PR is to define what "level" (entry-point) will be considered accepting a canonical goto form program, so that we can start writing the formal structural definition of it. We set this level to be the entry point for symex ( In the following sprints (and PRs) we will outline what structural and semantics constraints define a canonical goto form. When the formalization will be completed it will also be possible to write a function for validating the compliance of a goto program. |
@esteffin I see you point and sorry if I am being unhelpful but...
Perhaps, as a compromise, we could document the life-cycle / pipeline of the goto-program? I think I have captured some of the main milestones above. The whole question of what is and isn't valid is a thorny one that I have hurt myself on many times. It really needs to be solved but it is going to take some work unpicking what the code does at the moment. |
Now that I see where you are going with this ... step 0 of an n step process to get a checkable definition of what is a canonical GOTO program (or, perhaps one of a few defined canonical forms) then I'd be happy to approve this PR after fixing that one sentence I can't parse (see comment earlier). |
Because GOTO is not necessarily C. The source language could have been Java, C++, ASM or Rust for example. Would not a requirement that all other languages must be translated into valid C source code before converting them to goto be something of an architectural straight jacket? Some language features such as bit rotation operations make sense to have as a language feature of GOTO, because they are straight forward to analyse as rotation operations but clumsy to express in standardised C as that requires multiple operators. Other features such as virtual function calls are useful to have as a temporary extension to GOTO before they are lowered/removed for further analysis. Which brings us to -
My view is that we should initially define the canonical form more as a "core" GOTO language. This would define the minimum feature set which would need to be implemented as part of implementing a replacement analysis as a substitute to symex. I certainly acknowledge the existence of alternative forms/languages and the futility of wishing them out of existence. The alternative forms can subsequently be explained on the basis of being the "core" GOTO language with the addition of varying language extensions. Each of which are essentially defined based on a currently implemented lowering/removal pass. I think it would be useful to document the "core" language first before going into the details of the additional extended languages, because this provides value in a step-wise fashion. |
f5710cc
to
ebb2836
Compare
> Why not at the last point that we can transform it back to
> equivalent C (which was one of the design aims of GOTO)?
Because GOTO is not necessarily C. The source language could have
been Java, C++, ASM or Rust for example. Would not a requirement that
all other languages must be translated into valid C source code
before converting them to goto be something of an architectural
straight jacket?
"We can exactly reproduce C source code (minus preprocessor and
comments)" was an original design aim of GOTO and AFAIK still is and it
should still work. Basically `--dump-c` should be essentially a
formatting pass. One of the dialects / "normal forms" of GOTO should
be "equivalent to C".
|
Perhaps "core" GOTO might be a better term than "canonical", and I can see that at some point in the compiler flow there will be a last point at which (if you started your journey in C) you can translate back to C and get something very close to what you started from, and this is an interesting point that one might want documented. But, I don't think that is what people are shooting for here. Rather, I think we're looking for a core/minimal language fairly late in the compiler flow, that is not defined with reference to a source language, and can subsequently be given a semantics separate from that implicit in the C++ code that implements CBMC. |
ebb2836
to
7cb0981
Compare
@jimgrundy I like the concept of core instead of canonical. I re-wrote the documentation accordingly. |
I do wonder whether it may be better to keep this in "draft" status and only consider merging when some additional work (as is planned in the upcoming weeks) is complete? I am finding it quite hard to judge the suitability of the current piece of text. |
@tautschnig I would recommend to merge this as it creates a baseline for the future core goto documentation efforts. In case the future changes in core goto documentation strategy we will amend this documentation too. |
That "baseline" is a document with a number of undefined terms. Given that there will be future work I'm not too worried and maybe the mere existence of the file in the repository helps, but I also cannot clearly see the value just yet. So I'm a bit "don't care" and neither inclined to block nor to endorse merging this as-is. |
@jimgrundy Hi Jim, is the new rewritten documentation ok for you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a fine opening move in the process of documenting what the core/irreducible features of GOTO are.
In the future I would like to see a predicate (boolean function) defined that returns true iff and only if a GOTO program is in the "core" subset. Stating that it is the set of GOTO programs accepted by a list of functions kind of works, but reading the source of those functions to understand what "core" is isn't going to be effective. Rather we should define an "is_core" function and if any of those other functions fails to accept something for which is_core is true then either we need to fix is_core or fix that function.
The goal of having a validation for |
This PR defines the requirements for a program to be in canonical goto form.
A canonical goto program is defined by means of a functional requirement: being able to be analyzed by any solver extending
goto_verifiert
without any prior lowering step.The aim, in the future, is to replace such functional requirement with structural ones.
The definition of canonical goto program has been added for reference in the
doc/ADR/canonical_goto.md
file.