Add more projects to MINI-PROJECTS.md

NlightNFotis · NlightNFotis · commit 74a88d7d88b3 · 2021-06-01T15:53:44.000+01:00
Port the list of projects available at https://web.archive.org/web/20150530010030/http://www.cprover.org/wiki/doku.php?id=starter_projects to the new markdown file inside the repo.
diff --git a/MINI-PROJECTS.md b/MINI-PROJECTS.md
@@ -104,6 +104,314 @@ share a bound.
   those bounds is reached. Good, thoughtful test cases are essential
   here!
 
+### Instrumentation to Check Pointer Alignment
+
+Some architectures only allow load instructions / pointer dereferencing when the
+address is a multiple of the word length (i.e. 4 bytes or 8 bytes). This project
+would add a new instrumentation option for `cbmc` and `goto-instrument` that checks
+each pointer before dereference and makes sure it's correctly aligned.
+
+### Command line options to SAT solvers
+
+There are options for which SMT solver to run and there are ways of building with
+multiple SAT solvers but no command line options to pick a SAT solver. It would be
+nice if there were.
+
+César is working on this one. (?)
+
+### Combine Uninitialize Variables Analysis with Alias Analysis
+
+`goto-instrument` features an instrumentation to track cases of use-before-define, i.e.
+reading of uninitialised variables. At present, the analysis does not take into account
+aliasing information. As such, the following code will yield a spurious counterexample:
+
+``` C
+int main()
+{
+  int x;
+  int *ptr;
+  ptr=&x;
+  *ptr=42;
+  if(x < 42)
+    return 1;
+  return 0;
+}
+```
+
+### Constant Folding
+
+With the recently added reaching definitions analysis, it is straightforward to implement
+a program transformation that performs constant folding. ~To ensure soundness in concurrent
+settings, it should be combined with the existing is_threaded analysis.~ (A concurrency-aware
+version of reaching definitions now exists and is awaiting a merge.) Along the way, assertions
+that become trivially true should be eliminated.
+
+### Make SAMATE Tests Work
+
+Read [this paper](https://www.sciencedirect.com/science/article/abs/pii/S0950584913000384) and
+make the tests mentioned in there work.
+
+### Error Explanation
+
+Re-implement the algorithm described in [this paper](http://www.kroening.com/papers/STTT-error-explanation-2005.pdf)
+
+### LTL
+
+Build an instrumenter in `goto-instrument` for checking LTL (or even CTL) properties on C code.
+E.g look at [this paper](https://www.umsec.umn.edu/publications/Partial-Translation-Verification-Untrusted-Code)
+(And please get in touch with Michael Tautschnig, who has done very similar work in FShell).
+
+Rajdeep is now looking into this one. (?)
+
+### Test SMT Back Ends
+
+CBMC allows to choose among a selection of SMT solvers to use as solving back end. As these
+tools evolve, CBMC's code for running them may need a fresh look from time to time. After
+a first round of manual experiments, these tests should become part of the routine build
+process on `dkr-build`. (Michael Tautschnig knows how to do this.)
+
+### Simple Test Harness Generation
+
+CBMC currently outputs a trace of execution when it find a counter-example. Instead it
+should be possible to produce a C file that when compiled and run will cause the program
+to execute this trace. This will allow a faster verify/debug cycle. (To a certain
+extent this has been accomplished in FShell. Speak to Michael.)
+
+Pascal is now looking at this one. (?)
+
+### Improved Interpreter
+
+CProver has an interpreter for goto programs but it could be improved, for example adding
+support for dynamic analysis, calls to external functions and possibly even mixed concrete
+and symbolic execution. It would also be good to be able to interpret traces found by BMC
+to catch mismatches / bugs between the logical model and the bit blasting models. This
+would also be useful for more advanced CEGAR like techniques (also see below).
+
+### "KLEE-PROVER"
+
+Alter the CBMC symbolic execution engine to create a dynamic symbolic execution tool similar
+to [KLEE](https://klee.github.io/). Possible features include:
+
+* Distributed and incremental support using the caching scheme here
+* A feature to run BMC as you go to explore the traces 'around' the execution traces
+* Use path exploration as the witness generation part of an ACDL process.
+* Perform precondition inference, possibly using ACDL.
+
+See the work Björn and Alex have done with path-symex. (?)
+
+## Interact with GDB
+
+GDB implements a client-server protocol with a GDB instance acting as a server and the
+GUI as a client. (The documentation for that protocol may be found [here](http://www.sourceware.org/gdb/current/onlinedocs/gdb/GDB_002fMI.html).)
+By implementing this protocol in various place within CProver, there are various
+things that could be achieved.
+
+To do some of these will require working out which functions are non-deterministic models
+of real functions and accounting for this. This would also be necessary for some of the
+preceeding ideas.
+
+### Trace as a GDB server
+
+Implement a GDB server that uses a trace to 'execute' the program. This allows any GUI
+debugger that can talk to GDB to be used to graphically explore the trace. Particularly
+valuable for concurrent programs in which just reading the trace could be very hard.
+
+Pascal has something like this implemented for Eclipse. (?)
+
+Handling the SV-COMP counter-example format (ask Michael) would probably make this tool
+a lot more widely usable.
+
+### Trace as a GDB client
+
+Conversely, implement a GDB client that uses the trace to control a debugger. If connected
+to a 'real' GDB then this would allow the debugger to be set to the position that shows the
+error. In the case of parallel programs this avoids the need to create custom schedulers, etc.
+and over-all it is a much better interface – rather than generate a trace, CProver simply
+gives you the debugger in the state that triggers the problem.
+
+One option would be to convert the trace to a GDB script but this misses some of the value
+as it would be possible to query they GDB server as the program runs to check that values
+really are assigned correctly. This will catch the first point at which the model and the
+real system diverge and allows checked 'resimulation' of the traces on a 'real' platform.
+If the goto-binary contains the ELF binary then this could be used directly.
+
+### Interpreter as a GDB server
+
+An enabling tech, this would allow us to run the interpretter as a debugger back end. Combine
+with the trace as a GDB client, this would allow resimulation of traces, which would be useful
+for catching encoding bugs in CProver and ideally should be run after the decode from the solver
+all of the time.
+
+Michael has a partial implementation of this. (?)
+
+### Recording GDB server
+
+Acts as a GDB server and passes commands back to another GDB server to actually execute.
+Records the state changes and locations visited. This allows a test case to be 'run' and
+the trace / conditions on the trace to be collected and (after generalisation) be used for
+test case driven analysis / test case generalisation / something “concolic” like.
+
+### Differential GDB server
+
+Acts as a GDB server and then connects (as a client) to one or more GDB servers. Commands
+are executed on all of these and then the state / position compared (much of the infrastructure
+of the previous idea could be used). Divergences of behaviour can then be found, particularly
+root causes can be identified by the first divergence. This may not be CProver specific but
+can likely be used for various projects and ideas above.
+
+### "Concolic" GDB client
+
+An interactive component that runs a program via a GDB server and query / interact with it.
+This would be useful for building a “concolic” tool, especially for running library functions
+that are not defined.
+
+### Other ideas
+
+Valgrind and the Linux kernel both have GDB servers - what could we do with these?
+
+## Static Detection of Data Races
+
+It would be good to have a tool that checks for data races in a similar fashion to
+DRD / Helgrind. The current pointer analysis can over approximate shared variables,
+so possibly some combination of this, trace generation, DRD / Helgrind and then
+refinement of which traces to pick? Perhaps this is even something ACDL like?
+
+### CORAL / Q
+
+Build something similar to the Q tool in SLAM. (?)
+
+### CryptoMinisat
+
+Try [CryptoMinisat](https://github.com/msoos/cryptominisat) vs Minisat.
+
+### k-liveness
+
+Implement [k-liveness](http://www.cs.utexas.edu/~hunt/FMCAD/FMCAD12/013.pdf)
+for software-liveness checking.
+
+Hongyi is now looking at this (?)
+
+### Known Bugs on the Web
+
+Turn the list of `KNOWNBUG`S from the CBMC regressions into a webpage of known bugs
+of the CBMC `develop` branch and the last CBMC release.
+
+### Coding Standards Checker
+
+Build instrumenters for various coding standards, e.g. AutoSar, [MISRA-C](https://en.wikipedia.org/wiki/MISRA_C),
+CERT-C, [HIC++](https://www.perforce.com/resources/qac/high-integrity-cpp-coding-standard), etc.
+
+### Front-End Testing
+
+Run the regressions mentioned [here](https://github.com/kframework/c-semantics/tree/master/tests) and
+the [GCC C Torture tests](https://gcc.gnu.org/onlinedocs/gccint/C-Tests.html) with `goto-cc` and a
+`goto-program` interpreter (mild dependency on the interpreter-related task above).
+
+### Unspecified C Behaviours
+
+Build instrumentation to check for unspecified behaviours in C, as per Section J.2
+(and maybe also J.3) of the C standard. Some of them may be possible to handle
+entirely in the front-end (for example, catching `x = x++` as modifying a variable
+twice between sequence points could be done in side effect removal in ansi-c), while
+others may be best to be instrumentation (for example handling the unspecified nature
+of the order of evaluation of function arguments).
+
+Talk to Michael Tautschnig or Martin Brain for this item.
+
+### Reporting Software Errors with CWE Information
+
+The [Common Weakness Enumeration project](http://cwe.mitre.org/) seeks to classify software
+weaknesses and errors in a common dictionary. Reporting CWE information together with failed
+assertions, such as [Free of Pointer not at Start of Buffer](http://cwe.mitre.org/data/definitions/761.html),
+would help both users of the tools as well as industrial partners. In the first instance, errors
+already caught by present instrumentation should be augmented with additional information. In a
+second step, instrumentation for other problems listed in the CWE database should be developed.
+
+### Machine Learn Refinement settings
+
+The refinement option enables a tactic of under and over approximation to try to solve the
+generated formulae more quickly. There are a large number of possible options – which operations
+to approximate, how much to under or over approximate them, how much to add back if this doesn't
+work, etc. Rather than trying to manually pick the best strategy from these, it would be good
+to machine learn them.
+
+It would probably be worth asking Nando de Freitas' advice on the machine learning side of things.
+
+Ruben is now looking at this one. (?)
+
+### Cluster CBMC
+
+Most users of CBMC have access to multi-core machines; many have clusters available. There
+are a number of different approaches and a number of different levels at which the task
+could be parallelised; per assertion, portfolio approaches of different back-ends,
+parallel solvers, etc. It would be good to be able to exploit these. The caching approach
+suggested under “KLEE-PROVER” might be an easy route to implementing this.
+
+### x86 to Goto
+
+Look at various efforts to model x86 assembler, e.g., Andrew Kennedy,
+“Formalizing .EXEs, .DLLs, and all that”, Gang Tan, “Reusable tools for formal modeling of machine code”,
+and check whether this can turn into a translator from x86 to goto binary.
+
+Also look at Google's Native Client, and talk to Matt, who has tried Valgrind.
+
+Furthermore, talk to Michael, who has got a student who made some progress on this.
+
+And maybe have a read of Automated synthesis of symbolic instruction encodings from I/O samples.
+
+### Branch Prediction
+
+Implement branch prediction algorithms in path-symeex.
+
+### An abstract interpreter for C
+
+There's a lack for a robust, open source, abstract interpreter for C/C++. CPROVER contains most of the
+components for it, the key thing that would be needed is a front-end and some testing, plus possibly
+some more domains.
+
+### GCC Torture Tests
+
+GCC has a set of 'torture tests' which are intended to stress test and find bugs in their C parser
+and front-end. It would be good to run these through goto-cc / add them to the nightly testing and
+fix any bugs that arise. (Possible overlap with “Front-End Testing” above.). The same is probably
+also true for CSmith and other compiler fuzzing tools.
+
+### Compiler loop optimisations for BMC
+
+Try out some of the compiler loop optimisations (hoisting invariant code,
+fusing sequential independent loops, etc.) and see if they improve BMC,
+K-Induction or loop acceleration. These should be possible to implement in goto-instrument.
+
+Michael has an undergraduate student looking at this… (?)
+
+### Handling restrict in a smart way
+
+C99 adds the `restrict` keyword to mark unaliased pointers. This can be discarded by the compiler
+and thus has no semantics, it is only an annotation. However, there are a number of thing we could
+do with:
+
+1. Insert checks in calling functions to make sure that the pointers really are unaliased.
+2. Add an instrumentation pass that adds assumptions that restricted pointers are not aliased.
+This may imporve the quality of counter-examples (although at the cost of potentially missing
+some bugs, although not if combined with the previous technique).
+
+### C++ Exceptions
+
+C++ has annotations for what exceptions a function can throw. It would be good to be
+able to check / generate / minimise these and it should be possible to do this with
+a reasonably simple abstract domain. The CVC4 project would be interested in using this if it exists.
+
+Peter may have some thoughts on this. (?)
+
+### Java Memory Model
+
+CBMC's Java front-end is currently in development and will support the full Java bytecode
+instruction set soon. Beyond that, it would be great to support some more advanced aspects
+of the Java programming language semantics, such as the Java Memory Model. This will include
+adding models for a subset of the Java concurrency packages in the Java Runtime Library.
+
+Jade Alglave may have some additional input on this. (?)
+
 Completed
 ---------