Added concept of core goto language form #7505

esteffin · 2023-01-25T17:16:11Z

This PR defines the requirements for a program to be in canonical goto form.

A canonical goto program is defined by means of a functional requirement: being able to be analyzed by any solver extending goto_verifiert without any prior lowering step.

The aim, in the future, is to replace such functional requirement with structural ones.

The definition of canonical goto program has been added for reference in the doc/ADR/canonical_goto.md file.

codecov · 2023-01-25T17:58:52Z

Codecov Report

Base: 78.48% // Head: 78.48% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (7cb0981) compared to base (cbef34e).
Patch has no changes to coverable lines.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #7505   +/-   ##
========================================
  Coverage    78.48%   78.48%           
========================================
  Files         1663     1663           
  Lines       191249   191258    +9     
========================================
+ Hits        150103   150112    +9     
  Misses       41146    41146

Impacted Files	Coverage Δ
src/goto-instrument/contracts/utils.h	`100.00% <0.00%> (ø)`
src/goto-instrument/contracts/contracts.cpp	`95.54% <0.00%> (+0.01%)`	⬆️
src/goto-instrument/contracts/utils.cpp	`91.70% <0.00%> (+0.25%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

martin-cs

Please, please, please can we have the checks of canonical form to be in code not in documentation? As part of --validate-goto-model would be 🌟magical🌟. See
#7471 (comment) for rationale on the lifespan of this documentation.

Also note there is not one canonical form, see #6495 for a suggested way of actually checking an enforcing the various properties that a program might have.

jimgrundy · 2023-01-25T22:48:14Z

doc/ADR/canonical_goto.md

+by any solver deriving from `goto_verifiert` by using the `goto_verifiert::operator()` function
+without requiring any lowering step.
+
+At the time of writing verifiers extending `goto_verifiert`, so defining what a **canonical goto


I can't parse this sentence. Also, this definition of canonical form doesn't really help me much. I'd rather have
a description than defining it implicitly as whatever these verifiers can process. If I wanted to write something of my own that took the canonical form this wouldn't help me much.

Having a high-level, declarative specification of what GOTO is and of the rules checked by the verifiers would solve the underlying problem of helping people build a mental model of canonical GOTO independently of the C++ that implements the concept.

martin-cs · 2023-01-26T11:10:40Z

Having a high-level, declarative specification of what GOTO is and of the rules checked by the verifiers would solve the underlying problem of helping people build a mental model of canonical GOTO independently of the C++ that implements the concept.

Agreed. It would be great to have a formal semantics. What I am saying is that in my experience of this project, text descriptions that have no computational checking / testing / aspect / whatever tend to be incomplete and diverge from the code. A literate programming implementation of `goto_programt::instructiont::validate` and `validate_goto_model` would be a great way to solve this, give example code of how to access everything and would be hard to not update. If someone can get me the resource / time to do this I will even do it myself!

martin-cs · 2023-01-26T11:42:56Z

Here is another example of why "normal forms" are tricky. Is a "back edge" a literal, code order back edge or a topological back edge : #7506

esteffin · 2023-01-26T15:34:27Z

Hi, I agree with almost all that you raised.

However this PR is not trying to give a formal structural definition of what canonical goto form is.

The aim of this PR is to define what "level" (entry-point) will be considered accepting a canonical goto form program, so that we can start writing the formal structural definition of it.

We set this level to be the entry point for symex (veryfier_typet).

In the following sprints (and PRs) we will outline what structural and semantics constraints define a canonical goto form.

When the formalization will be completed it will also be possible to write a function for validating the compliance of a goto program.

martin-cs · 2023-01-26T20:50:45Z

@esteffin I see you point and sorry if I am being unhelpful but...

Why there? Why not here

cbmc/src/cbmc/cbmc_parse_options.cpp

Line 685 in cbef34e

after the cbmc pre-processing is done? Why not here

cbmc/src/cbmc/cbmc_parse_options.cpp

Line 766 in cbef34e

after process_goto_program ? (Which should now be common to all of the analysis tools :fingercrossed: Refactor process_goto_program so that all tools use common processing code #5807 ) Why not here

cbmc/src/goto-programs/initialize_goto_model.cpp

Line 245 in cbef34e

so it is the goto program you get from the call to load? Or here

cbmc/src/goto-programs/read_goto_binary.cpp

Line 148 in cbef34e

return read_err;

so that it corresponds with what is on disk? Why not at the last point that we can transform it back to equivalent C (which was one of the design aims of GOTO)?
I am concerned that documenting might lead people to believe in a single canonical form that they can't easily generate, isn't checked, isn't enforced, isn't specified and isn't really present in the current design.

Perhaps, as a compromise, we could document the life-cycle / pipeline of the goto-program? I think I have captured some of the main milestones above.

The whole question of what is and isn't valid is a thorny one that I have hurt myself on many times. It really needs to be solved but it is going to take some work unpicking what the code does at the moment.

jimgrundy · 2023-01-26T21:55:06Z

Now that I see where you are going with this ... step 0 of an n step process to get a checkable definition of what is a canonical GOTO program (or, perhaps one of a few defined canonical forms) then I'd be happy to approve this PR after fixing that one sentence I can't parse (see comment earlier).

thomasspriggs · 2023-01-27T12:05:08Z

Why not at the last point that we can transform it back to equivalent C (which was one of the design aims of GOTO)?

Because GOTO is not necessarily C. The source language could have been Java, C++, ASM or Rust for example. Would not a requirement that all other languages must be translated into valid C source code before converting them to goto be something of an architectural straight jacket?

Some language features such as bit rotation operations make sense to have as a language feature of GOTO, because they are straight forward to analyse as rotation operations but clumsy to express in standardised C as that requires multiple operators.

Other features such as virtual function calls are useful to have as a temporary extension to GOTO before they are lowered/removed for further analysis. Which brings us to -

I am concerned that documenting might lead people to believe in a single canonical form

My view is that we should initially define the canonical form more as a "core" GOTO language. This would define the minimum feature set which would need to be implemented as part of implementing a replacement analysis as a substitute to symex. I certainly acknowledge the existence of alternative forms/languages and the futility of wishing them out of existence. The alternative forms can subsequently be explained on the basis of being the "core" GOTO language with the addition of varying language extensions. Each of which are essentially defined based on a currently implemented lowering/removal pass. I think it would be useful to document the "core" language first before going into the details of the additional extended languages, because this provides value in a step-wise fashion.

martin-cs · 2023-01-27T20:18:58Z

> Why not at the last point that we can transform it back to > equivalent C (which was one of the design aims of GOTO)? Because GOTO is not necessarily C. The source language could have been Java, C++, ASM or Rust for example. Would not a requirement that all other languages must be translated into valid C source code before converting them to goto be something of an architectural straight jacket?

"We can exactly reproduce C source code (minus preprocessor and comments)" was an original design aim of GOTO and AFAIK still is and it should still work. Basically `--dump-c` should be essentially a formatting pass. One of the dialects / "normal forms" of GOTO should be "equivalent to C".

jimgrundy · 2023-01-28T23:18:18Z

Perhaps "core" GOTO might be a better term than "canonical", and I can see that at some point in the compiler flow there will be a last point at which (if you started your journey in C) you can translate back to C and get something very close to what you started from, and this is an interesting point that one might want documented. But, I don't think that is what people are shooting for here.

Rather, I think we're looking for a core/minimal language fairly late in the compiler flow, that is not defined with reference to a source language, and can subsequently be given a semantics separate from that implicit in the C++ code that implements CBMC.

esteffin · 2023-01-30T12:22:38Z

@jimgrundy I like the concept of core instead of canonical.

I re-wrote the documentation accordingly.

tautschnig · 2023-01-30T13:25:36Z

I do wonder whether it may be better to keep this in "draft" status and only consider merging when some additional work (as is planned in the upcoming weeks) is complete? I am finding it quite hard to judge the suitability of the current piece of text.

esteffin · 2023-01-31T18:04:30Z

@tautschnig I would recommend to merge this as it creates a baseline for the future core goto documentation efforts.

In case the future changes in core goto documentation strategy we will amend this documentation too.

tautschnig · 2023-02-01T09:24:18Z

@tautschnig I would recommend to merge this as it creates a baseline for the future core goto documentation efforts.

In case the future changes in core goto documentation strategy we will amend this documentation too.

That "baseline" is a document with a number of undefined terms. Given that there will be future work I'm not too worried and maybe the mere existence of the file in the repository helps, but I also cannot clearly see the value just yet. So I'm a bit "don't care" and neither inclined to block nor to endorse merging this as-is.

esteffin · 2023-02-02T10:56:35Z

@jimgrundy Hi Jim, is the new rewritten documentation ok for you?
If so, when you approve this PR I will merge it.

jimgrundy

This seems like a fine opening move in the process of documenting what the core/irreducible features of GOTO are.

In the future I would like to see a predicate (boolean function) defined that returns true iff and only if a GOTO program is in the "core" subset. Stating that it is the set of GOTO programs accepted by a list of functions kind of works, but reading the source of those functions to understand what "core" is isn't going to be effective. Rather we should define an "is_core" function and if any of those other functions fails to accept something for which is_core is true then either we need to fix is_core or fix that function.

TGWDB · 2023-02-03T13:53:33Z

This seems like a fine opening move in the process of documenting what the core/irreducible features of GOTO are.

In the future I would like to see a predicate (boolean function) defined that returns true iff and only if a GOTO program is in the "core" subset. Stating that it is the set of GOTO programs accepted by a list of functions kind of works, but reading the source of those functions to understand what "core" is isn't going to be effective. Rather we should define an "is_core" function and if any of those other functions fails to accept something for which is_core is true then either we need to fix is_core or fix that function.

The goal of having a validation for is_core for a GOTO is shared, the challenge is the complexity of writing and verifying this. For now this PR was merged with the understanding that steps along this path are the goal, having this fully implemented is non-trivial (to say the least). We expect such a function to be developer incrementally.

esteffin requested review from jimgrundy, TGWDB, kroening, tautschnig, peterschrammel and NlightNFotis as code owners January 25, 2023 17:16

thomasspriggs approved these changes Jan 25, 2023

View reviewed changes

martin-cs reviewed Jan 25, 2023

View reviewed changes

jimgrundy requested changes Jan 25, 2023

View reviewed changes

NlightNFotis assigned esteffin Jan 26, 2023

esteffin force-pushed the esteffin/add-requirement-for-canonical-goto branch from f5710cc to ebb2836 Compare January 27, 2023 17:06

Added concept of canonical goto program

7cb0981

esteffin force-pushed the esteffin/add-requirement-for-canonical-goto branch from ebb2836 to 7cb0981 Compare January 30, 2023 12:21

esteffin changed the title ~~Added concept of canonical form for a goto program~~ Added concept of core goto language form Jan 30, 2023

esteffin requested a review from jimgrundy January 30, 2023 12:23

jimgrundy approved these changes Feb 2, 2023

View reviewed changes

esteffin merged commit 3f578e1 into diffblue:develop Feb 3, 2023

remi-delmas-3000 mentioned this pull request Feb 13, 2023

loop normal form properties #7518

Open

NlightNFotis mentioned this pull request May 12, 2023

Add capability to export goto-program in symex-ready-goto form to CBMC. #7697

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added concept of core goto language form #7505

Added concept of core goto language form #7505

esteffin commented Jan 25, 2023

codecov bot commented Jan 25, 2023 •

edited

Loading

martin-cs left a comment

jimgrundy Jan 25, 2023

remi-delmas-3000 Jan 26, 2023

martin-cs commented Jan 26, 2023 via email

martin-cs commented Jan 26, 2023

esteffin commented Jan 26, 2023

martin-cs commented Jan 26, 2023 •

edited

Loading

jimgrundy commented Jan 26, 2023

thomasspriggs commented Jan 27, 2023

martin-cs commented Jan 27, 2023 via email

jimgrundy commented Jan 28, 2023

esteffin commented Jan 30, 2023

tautschnig commented Jan 30, 2023

esteffin commented Jan 31, 2023

tautschnig commented Feb 1, 2023

esteffin commented Feb 2, 2023

jimgrundy left a comment

TGWDB commented Feb 3, 2023

Added concept of core goto language form #7505

Added concept of core goto language form #7505

Conversation

esteffin commented Jan 25, 2023

codecov bot commented Jan 25, 2023 • edited Loading

Codecov Report

martin-cs left a comment

Choose a reason for hiding this comment

jimgrundy Jan 25, 2023

Choose a reason for hiding this comment

remi-delmas-3000 Jan 26, 2023

Choose a reason for hiding this comment

martin-cs commented Jan 26, 2023 via email

martin-cs commented Jan 26, 2023

esteffin commented Jan 26, 2023

martin-cs commented Jan 26, 2023 • edited Loading

jimgrundy commented Jan 26, 2023

thomasspriggs commented Jan 27, 2023

martin-cs commented Jan 27, 2023 via email

jimgrundy commented Jan 28, 2023

esteffin commented Jan 30, 2023

tautschnig commented Jan 30, 2023

esteffin commented Jan 31, 2023

tautschnig commented Feb 1, 2023

esteffin commented Feb 2, 2023

jimgrundy left a comment

Choose a reason for hiding this comment

TGWDB commented Feb 3, 2023

codecov bot commented Jan 25, 2023 •

edited

Loading

martin-cs commented Jan 26, 2023 •

edited

Loading