Skip to content

Added concept of core goto language form #7505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

esteffin
Copy link
Contributor

This PR defines the requirements for a program to be in canonical goto form.

A canonical goto program is defined by means of a functional requirement: being able to be analyzed by any solver extending goto_verifiert without any prior lowering step.

The aim, in the future, is to replace such functional requirement with structural ones.

The definition of canonical goto program has been added for reference in the doc/ADR/canonical_goto.md file.

@codecov
Copy link

codecov bot commented Jan 25, 2023

Codecov Report

Base: 78.48% // Head: 78.48% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (7cb0981) compared to base (cbef34e).
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7505   +/-   ##
========================================
  Coverage    78.48%   78.48%           
========================================
  Files         1663     1663           
  Lines       191249   191258    +9     
========================================
+ Hits        150103   150112    +9     
  Misses       41146    41146           
Impacted Files Coverage Δ
src/goto-instrument/contracts/utils.h 100.00% <0.00%> (ø)
src/goto-instrument/contracts/contracts.cpp 95.54% <0.00%> (+0.01%) ⬆️
src/goto-instrument/contracts/utils.cpp 91.70% <0.00%> (+0.25%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, please, please can we have the checks of canonical form to be in code not in documentation? As part of --validate-goto-model would be 🌟magical🌟. See
#7471 (comment) for rationale on the lifespan of this documentation.

Also note there is not one canonical form, see #6495 for a suggested way of actually checking an enforcing the various properties that a program might have.

by any solver deriving from `goto_verifiert` by using the `goto_verifiert::operator()` function
without requiring any lowering step.

At the time of writing verifiers extending `goto_verifiert`, so defining what a **canonical goto
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't parse this sentence. Also, this definition of canonical form doesn't really help me much. I'd rather have
a description than defining it implicitly as whatever these verifiers can process. If I wanted to write something of my own that took the canonical form this wouldn't help me much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a high-level, declarative specification of what GOTO is and of the rules checked by the verifiers would solve the underlying problem of helping people build a mental model of canonical GOTO independently of the C++ that implements the concept.

@martin-cs
Copy link
Collaborator

martin-cs commented Jan 26, 2023 via email

@martin-cs
Copy link
Collaborator

Here is another example of why "normal forms" are tricky. Is a "back edge" a literal, code order back edge or a topological back edge : #7506

@esteffin
Copy link
Contributor Author

Hi, I agree with almost all that you raised.

However this PR is not trying to give a formal structural definition of what canonical goto form is.

The aim of this PR is to define what "level" (entry-point) will be considered accepting a canonical goto form program, so that we can start writing the formal structural definition of it.

We set this level to be the entry point for symex (veryfier_typet).

In the following sprints (and PRs) we will outline what structural and semantics constraints define a canonical goto form.

When the formalization will be completed it will also be possible to write a function for validating the compliance of a goto program.

@martin-cs
Copy link
Collaborator

martin-cs commented Jan 26, 2023

@esteffin I see you point and sorry if I am being unhelpful but...

  1. Why there? Why not here

    after the cbmc pre-processing is done? Why not here after process_goto_program ? (Which should now be common to all of the analysis tools :fingercrossed: Refactor process_goto_program so that all tools use common processing code #5807 ) Why not here so it is the goto program you get from the call to load? Or here so that it corresponds with what is on disk? Why not at the last point that we can transform it back to equivalent C (which was one of the design aims of GOTO)?

  2. I am concerned that documenting might lead people to believe in a single canonical form that they can't easily generate, isn't checked, isn't enforced, isn't specified and isn't really present in the current design.

Perhaps, as a compromise, we could document the life-cycle / pipeline of the goto-program? I think I have captured some of the main milestones above.

The whole question of what is and isn't valid is a thorny one that I have hurt myself on many times. It really needs to be solved but it is going to take some work unpicking what the code does at the moment.

@jimgrundy
Copy link
Collaborator

Now that I see where you are going with this ... step 0 of an n step process to get a checkable definition of what is a canonical GOTO program (or, perhaps one of a few defined canonical forms) then I'd be happy to approve this PR after fixing that one sentence I can't parse (see comment earlier).

@thomasspriggs
Copy link
Contributor

Why not at the last point that we can transform it back to equivalent C (which was one of the design aims of GOTO)?

Because GOTO is not necessarily C. The source language could have been Java, C++, ASM or Rust for example. Would not a requirement that all other languages must be translated into valid C source code before converting them to goto be something of an architectural straight jacket?

Some language features such as bit rotation operations make sense to have as a language feature of GOTO, because they are straight forward to analyse as rotation operations but clumsy to express in standardised C as that requires multiple operators.

Other features such as virtual function calls are useful to have as a temporary extension to GOTO before they are lowered/removed for further analysis. Which brings us to -

I am concerned that documenting might lead people to believe in a single canonical form

My view is that we should initially define the canonical form more as a "core" GOTO language. This would define the minimum feature set which would need to be implemented as part of implementing a replacement analysis as a substitute to symex. I certainly acknowledge the existence of alternative forms/languages and the futility of wishing them out of existence. The alternative forms can subsequently be explained on the basis of being the "core" GOTO language with the addition of varying language extensions. Each of which are essentially defined based on a currently implemented lowering/removal pass. I think it would be useful to document the "core" language first before going into the details of the additional extended languages, because this provides value in a step-wise fashion.

@esteffin esteffin force-pushed the esteffin/add-requirement-for-canonical-goto branch from f5710cc to ebb2836 Compare January 27, 2023 17:06
@martin-cs
Copy link
Collaborator

martin-cs commented Jan 27, 2023 via email

@jimgrundy
Copy link
Collaborator

Perhaps "core" GOTO might be a better term than "canonical", and I can see that at some point in the compiler flow there will be a last point at which (if you started your journey in C) you can translate back to C and get something very close to what you started from, and this is an interesting point that one might want documented. But, I don't think that is what people are shooting for here.

Rather, I think we're looking for a core/minimal language fairly late in the compiler flow, that is not defined with reference to a source language, and can subsequently be given a semantics separate from that implicit in the C++ code that implements CBMC.

@esteffin esteffin force-pushed the esteffin/add-requirement-for-canonical-goto branch from ebb2836 to 7cb0981 Compare January 30, 2023 12:21
@esteffin
Copy link
Contributor Author

@jimgrundy I like the concept of core instead of canonical.

I re-wrote the documentation accordingly.

@esteffin esteffin changed the title Added concept of canonical form for a goto program Added concept of core goto language form Jan 30, 2023
@esteffin esteffin requested a review from jimgrundy January 30, 2023 12:23
@tautschnig
Copy link
Collaborator

I do wonder whether it may be better to keep this in "draft" status and only consider merging when some additional work (as is planned in the upcoming weeks) is complete? I am finding it quite hard to judge the suitability of the current piece of text.

@esteffin
Copy link
Contributor Author

@tautschnig I would recommend to merge this as it creates a baseline for the future core goto documentation efforts.

In case the future changes in core goto documentation strategy we will amend this documentation too.

@tautschnig
Copy link
Collaborator

@tautschnig I would recommend to merge this as it creates a baseline for the future core goto documentation efforts.

In case the future changes in core goto documentation strategy we will amend this documentation too.

That "baseline" is a document with a number of undefined terms. Given that there will be future work I'm not too worried and maybe the mere existence of the file in the repository helps, but I also cannot clearly see the value just yet. So I'm a bit "don't care" and neither inclined to block nor to endorse merging this as-is.

@esteffin
Copy link
Contributor Author

esteffin commented Feb 2, 2023

@jimgrundy Hi Jim, is the new rewritten documentation ok for you?
If so, when you approve this PR I will merge it.

Copy link
Collaborator

@jimgrundy jimgrundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a fine opening move in the process of documenting what the core/irreducible features of GOTO are.

In the future I would like to see a predicate (boolean function) defined that returns true iff and only if a GOTO program is in the "core" subset. Stating that it is the set of GOTO programs accepted by a list of functions kind of works, but reading the source of those functions to understand what "core" is isn't going to be effective. Rather we should define an "is_core" function and if any of those other functions fails to accept something for which is_core is true then either we need to fix is_core or fix that function.

@esteffin esteffin merged commit 3f578e1 into diffblue:develop Feb 3, 2023
@TGWDB
Copy link
Contributor

TGWDB commented Feb 3, 2023

This seems like a fine opening move in the process of documenting what the core/irreducible features of GOTO are.

In the future I would like to see a predicate (boolean function) defined that returns true iff and only if a GOTO program is in the "core" subset. Stating that it is the set of GOTO programs accepted by a list of functions kind of works, but reading the source of those functions to understand what "core" is isn't going to be effective. Rather we should define an "is_core" function and if any of those other functions fails to accept something for which is_core is true then either we need to fix is_core or fix that function.

The goal of having a validation for is_core for a GOTO is shared, the challenge is the complexity of writing and verifying this. For now this PR was merged with the understanding that steps along this path are the goal, having this fully implemented is non-trivial (to say the least). We expect such a function to be developer incrementally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

7 participants