Skip to content

Commit 74a88d7

Browse files
committed
Add more projects to MINI-PROJECTS.md
Port the list of projects available at https://web.archive.org/web/20150530010030/http://www.cprover.org/wiki/doku.php?id=starter_projects to the new markdown file inside the repo.
1 parent d7df1a2 commit 74a88d7

File tree

1 file changed

+308
-0
lines changed

1 file changed

+308
-0
lines changed

MINI-PROJECTS.md

Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,314 @@ share a bound.
104104
those bounds is reached. Good, thoughtful test cases are essential
105105
here!
106106
107+
### Instrumentation to Check Pointer Alignment
108+
109+
Some architectures only allow load instructions / pointer dereferencing when the
110+
address is a multiple of the word length (i.e. 4 bytes or 8 bytes). This project
111+
would add a new instrumentation option for `cbmc` and `goto-instrument` that checks
112+
each pointer before dereference and makes sure it's correctly aligned.
113+
114+
### Command line options to SAT solvers
115+
116+
There are options for which SMT solver to run and there are ways of building with
117+
multiple SAT solvers but no command line options to pick a SAT solver. It would be
118+
nice if there were.
119+
120+
César is working on this one. (?)
121+
122+
### Combine Uninitialize Variables Analysis with Alias Analysis
123+
124+
`goto-instrument` features an instrumentation to track cases of use-before-define, i.e.
125+
reading of uninitialised variables. At present, the analysis does not take into account
126+
aliasing information. As such, the following code will yield a spurious counterexample:
127+
128+
``` C
129+
int main()
130+
{
131+
int x;
132+
int *ptr;
133+
ptr=&x;
134+
*ptr=42;
135+
if(x < 42)
136+
return 1;
137+
return 0;
138+
}
139+
```
140+
141+
### Constant Folding
142+
143+
With the recently added reaching definitions analysis, it is straightforward to implement
144+
a program transformation that performs constant folding. ~To ensure soundness in concurrent
145+
settings, it should be combined with the existing is_threaded analysis.~ (A concurrency-aware
146+
version of reaching definitions now exists and is awaiting a merge.) Along the way, assertions
147+
that become trivially true should be eliminated.
148+
149+
### Make SAMATE Tests Work
150+
151+
Read [this paper](https://www.sciencedirect.com/science/article/abs/pii/S0950584913000384) and
152+
make the tests mentioned in there work.
153+
154+
### Error Explanation
155+
156+
Re-implement the algorithm described in [this paper](http://www.kroening.com/papers/STTT-error-explanation-2005.pdf)
157+
158+
### LTL
159+
160+
Build an instrumenter in `goto-instrument` for checking LTL (or even CTL) properties on C code.
161+
E.g look at [this paper](https://www.umsec.umn.edu/publications/Partial-Translation-Verification-Untrusted-Code)
162+
(And please get in touch with Michael Tautschnig, who has done very similar work in FShell).
163+
164+
Rajdeep is now looking into this one. (?)
165+
166+
### Test SMT Back Ends
167+
168+
CBMC allows to choose among a selection of SMT solvers to use as solving back end. As these
169+
tools evolve, CBMC's code for running them may need a fresh look from time to time. After
170+
a first round of manual experiments, these tests should become part of the routine build
171+
process on `dkr-build`. (Michael Tautschnig knows how to do this.)
172+
173+
### Simple Test Harness Generation
174+
175+
CBMC currently outputs a trace of execution when it find a counter-example. Instead it
176+
should be possible to produce a C file that when compiled and run will cause the program
177+
to execute this trace. This will allow a faster verify/debug cycle. (To a certain
178+
extent this has been accomplished in FShell. Speak to Michael.)
179+
180+
Pascal is now looking at this one. (?)
181+
182+
### Improved Interpreter
183+
184+
CProver has an interpreter for goto programs but it could be improved, for example adding
185+
support for dynamic analysis, calls to external functions and possibly even mixed concrete
186+
and symbolic execution. It would also be good to be able to interpret traces found by BMC
187+
to catch mismatches / bugs between the logical model and the bit blasting models. This
188+
would also be useful for more advanced CEGAR like techniques (also see below).
189+
190+
### "KLEE-PROVER"
191+
192+
Alter the CBMC symbolic execution engine to create a dynamic symbolic execution tool similar
193+
to [KLEE](https://klee.github.io/). Possible features include:
194+
195+
* Distributed and incremental support using the caching scheme here
196+
* A feature to run BMC as you go to explore the traces 'around' the execution traces
197+
* Use path exploration as the witness generation part of an ACDL process.
198+
* Perform precondition inference, possibly using ACDL.
199+
200+
See the work Björn and Alex have done with path-symex. (?)
201+
202+
## Interact with GDB
203+
204+
GDB implements a client-server protocol with a GDB instance acting as a server and the
205+
GUI as a client. (The documentation for that protocol may be found [here](http://www.sourceware.org/gdb/current/onlinedocs/gdb/GDB_002fMI.html).)
206+
By implementing this protocol in various place within CProver, there are various
207+
things that could be achieved.
208+
209+
To do some of these will require working out which functions are non-deterministic models
210+
of real functions and accounting for this. This would also be necessary for some of the
211+
preceeding ideas.
212+
213+
### Trace as a GDB server
214+
215+
Implement a GDB server that uses a trace to 'execute' the program. This allows any GUI
216+
debugger that can talk to GDB to be used to graphically explore the trace. Particularly
217+
valuable for concurrent programs in which just reading the trace could be very hard.
218+
219+
Pascal has something like this implemented for Eclipse. (?)
220+
221+
Handling the SV-COMP counter-example format (ask Michael) would probably make this tool
222+
a lot more widely usable.
223+
224+
### Trace as a GDB client
225+
226+
Conversely, implement a GDB client that uses the trace to control a debugger. If connected
227+
to a 'real' GDB then this would allow the debugger to be set to the position that shows the
228+
error. In the case of parallel programs this avoids the need to create custom schedulers, etc.
229+
and over-all it is a much better interface – rather than generate a trace, CProver simply
230+
gives you the debugger in the state that triggers the problem.
231+
232+
One option would be to convert the trace to a GDB script but this misses some of the value
233+
as it would be possible to query they GDB server as the program runs to check that values
234+
really are assigned correctly. This will catch the first point at which the model and the
235+
real system diverge and allows checked 'resimulation' of the traces on a 'real' platform.
236+
If the goto-binary contains the ELF binary then this could be used directly.
237+
238+
### Interpreter as a GDB server
239+
240+
An enabling tech, this would allow us to run the interpretter as a debugger back end. Combine
241+
with the trace as a GDB client, this would allow resimulation of traces, which would be useful
242+
for catching encoding bugs in CProver and ideally should be run after the decode from the solver
243+
all of the time.
244+
245+
Michael has a partial implementation of this. (?)
246+
247+
### Recording GDB server
248+
249+
Acts as a GDB server and passes commands back to another GDB server to actually execute.
250+
Records the state changes and locations visited. This allows a test case to be 'run' and
251+
the trace / conditions on the trace to be collected and (after generalisation) be used for
252+
test case driven analysis / test case generalisation / something “concolic” like.
253+
254+
### Differential GDB server
255+
256+
Acts as a GDB server and then connects (as a client) to one or more GDB servers. Commands
257+
are executed on all of these and then the state / position compared (much of the infrastructure
258+
of the previous idea could be used). Divergences of behaviour can then be found, particularly
259+
root causes can be identified by the first divergence. This may not be CProver specific but
260+
can likely be used for various projects and ideas above.
261+
262+
### "Concolic" GDB client
263+
264+
An interactive component that runs a program via a GDB server and query / interact with it.
265+
This would be useful for building a “concolic” tool, especially for running library functions
266+
that are not defined.
267+
268+
### Other ideas
269+
270+
Valgrind and the Linux kernel both have GDB servers - what could we do with these?
271+
272+
## Static Detection of Data Races
273+
274+
It would be good to have a tool that checks for data races in a similar fashion to
275+
DRD / Helgrind. The current pointer analysis can over approximate shared variables,
276+
so possibly some combination of this, trace generation, DRD / Helgrind and then
277+
refinement of which traces to pick? Perhaps this is even something ACDL like?
278+
279+
### CORAL / Q
280+
281+
Build something similar to the Q tool in SLAM. (?)
282+
283+
### CryptoMinisat
284+
285+
Try [CryptoMinisat](https://github.com/msoos/cryptominisat) vs Minisat.
286+
287+
### k-liveness
288+
289+
Implement [k-liveness](http://www.cs.utexas.edu/~hunt/FMCAD/FMCAD12/013.pdf)
290+
for software-liveness checking.
291+
292+
Hongyi is now looking at this (?)
293+
294+
### Known Bugs on the Web
295+
296+
Turn the list of `KNOWNBUG`S from the CBMC regressions into a webpage of known bugs
297+
of the CBMC `develop` branch and the last CBMC release.
298+
299+
### Coding Standards Checker
300+
301+
Build instrumenters for various coding standards, e.g. AutoSar, [MISRA-C](https://en.wikipedia.org/wiki/MISRA_C),
302+
CERT-C, [HIC++](https://www.perforce.com/resources/qac/high-integrity-cpp-coding-standard), etc.
303+
304+
### Front-End Testing
305+
306+
Run the regressions mentioned [here](https://github.com/kframework/c-semantics/tree/master/tests) and
307+
the [GCC C Torture tests](https://gcc.gnu.org/onlinedocs/gccint/C-Tests.html) with `goto-cc` and a
308+
`goto-program` interpreter (mild dependency on the interpreter-related task above).
309+
310+
### Unspecified C Behaviours
311+
312+
Build instrumentation to check for unspecified behaviours in C, as per Section J.2
313+
(and maybe also J.3) of the C standard. Some of them may be possible to handle
314+
entirely in the front-end (for example, catching `x = x++` as modifying a variable
315+
twice between sequence points could be done in side effect removal in ansi-c), while
316+
others may be best to be instrumentation (for example handling the unspecified nature
317+
of the order of evaluation of function arguments).
318+
319+
Talk to Michael Tautschnig or Martin Brain for this item.
320+
321+
### Reporting Software Errors with CWE Information
322+
323+
The [Common Weakness Enumeration project](http://cwe.mitre.org/) seeks to classify software
324+
weaknesses and errors in a common dictionary. Reporting CWE information together with failed
325+
assertions, such as [Free of Pointer not at Start of Buffer](http://cwe.mitre.org/data/definitions/761.html),
326+
would help both users of the tools as well as industrial partners. In the first instance, errors
327+
already caught by present instrumentation should be augmented with additional information. In a
328+
second step, instrumentation for other problems listed in the CWE database should be developed.
329+
330+
### Machine Learn Refinement settings
331+
332+
The refinement option enables a tactic of under and over approximation to try to solve the
333+
generated formulae more quickly. There are a large number of possible options – which operations
334+
to approximate, how much to under or over approximate them, how much to add back if this doesn't
335+
work, etc. Rather than trying to manually pick the best strategy from these, it would be good
336+
to machine learn them.
337+
338+
It would probably be worth asking Nando de Freitas' advice on the machine learning side of things.
339+
340+
Ruben is now looking at this one. (?)
341+
342+
### Cluster CBMC
343+
344+
Most users of CBMC have access to multi-core machines; many have clusters available. There
345+
are a number of different approaches and a number of different levels at which the task
346+
could be parallelised; per assertion, portfolio approaches of different back-ends,
347+
parallel solvers, etc. It would be good to be able to exploit these. The caching approach
348+
suggested under “KLEE-PROVER” might be an easy route to implementing this.
349+
350+
### x86 to Goto
351+
352+
Look at various efforts to model x86 assembler, e.g., Andrew Kennedy,
353+
“Formalizing .EXEs, .DLLs, and all that”, Gang Tan, “Reusable tools for formal modeling of machine code”,
354+
and check whether this can turn into a translator from x86 to goto binary.
355+
356+
Also look at Google's Native Client, and talk to Matt, who has tried Valgrind.
357+
358+
Furthermore, talk to Michael, who has got a student who made some progress on this.
359+
360+
And maybe have a read of Automated synthesis of symbolic instruction encodings from I/O samples.
361+
362+
### Branch Prediction
363+
364+
Implement branch prediction algorithms in path-symeex.
365+
366+
### An abstract interpreter for C
367+
368+
There's a lack for a robust, open source, abstract interpreter for C/C++. CPROVER contains most of the
369+
components for it, the key thing that would be needed is a front-end and some testing, plus possibly
370+
some more domains.
371+
372+
### GCC Torture Tests
373+
374+
GCC has a set of 'torture tests' which are intended to stress test and find bugs in their C parser
375+
and front-end. It would be good to run these through goto-cc / add them to the nightly testing and
376+
fix any bugs that arise. (Possible overlap with “Front-End Testing” above.). The same is probably
377+
also true for CSmith and other compiler fuzzing tools.
378+
379+
### Compiler loop optimisations for BMC
380+
381+
Try out some of the compiler loop optimisations (hoisting invariant code,
382+
fusing sequential independent loops, etc.) and see if they improve BMC,
383+
K-Induction or loop acceleration. These should be possible to implement in goto-instrument.
384+
385+
Michael has an undergraduate student looking at this… (?)
386+
387+
### Handling restrict in a smart way
388+
389+
C99 adds the `restrict` keyword to mark unaliased pointers. This can be discarded by the compiler
390+
and thus has no semantics, it is only an annotation. However, there are a number of thing we could
391+
do with:
392+
393+
1. Insert checks in calling functions to make sure that the pointers really are unaliased.
394+
2. Add an instrumentation pass that adds assumptions that restricted pointers are not aliased.
395+
This may imporve the quality of counter-examples (although at the cost of potentially missing
396+
some bugs, although not if combined with the previous technique).
397+
398+
### C++ Exceptions
399+
400+
C++ has annotations for what exceptions a function can throw. It would be good to be
401+
able to check / generate / minimise these and it should be possible to do this with
402+
a reasonably simple abstract domain. The CVC4 project would be interested in using this if it exists.
403+
404+
Peter may have some thoughts on this. (?)
405+
406+
### Java Memory Model
407+
408+
CBMC's Java front-end is currently in development and will support the full Java bytecode
409+
instruction set soon. Beyond that, it would be great to support some more advanced aspects
410+
of the Java programming language semantics, such as the Java Memory Model. This will include
411+
adding models for a subset of the Java concurrency packages in the Java Runtime Library.
412+
413+
Jade Alglave may have some additional input on this. (?)
414+
107415
Completed
108416
---------
109417

0 commit comments

Comments
 (0)