Skip to content

Commit f5f965a

Browse files
authored
Merge pull request #4119 from NlightNFotis/mini_projects_list_update
Mini projects list update
2 parents d7df1a2 + c07c76f commit f5f965a

File tree

1 file changed

+294
-23
lines changed

1 file changed

+294
-23
lines changed

MINI-PROJECTS.md

Lines changed: 294 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,11 @@
1-
CBMC Mini Projects
2-
==================
1+
# CBMC Mini Project Ideas
32

43
The following projects are short, focussed features that give new CBMC
54
contributors an introduction to one part of the codebase. If you're
65
interested in contributing to CBMC, feel free to start with any of these
76
projects!
87

9-
10-
Task List
11-
---------
12-
13-
### `__CPROVER_print`
8+
## `__CPROVER_print`
149

1510
**Task**: Implement a CPROVER intrinsic that assigns an expression to a
1611
dummy value, to help with debugging and tracing.
@@ -45,8 +40,7 @@ manual ghost assignment in their code.
4540
**Hints**: Figure out how `__CPROVER_assume` and `__CPROVER_assert` work.
4641
Then figure out how to add new instructions into a goto-program.
4742
48-
49-
### Annotation to constrain function pointers
43+
## Annotation to constrain function pointers
5044
5145
**Task**: implement a CPROVER intrinsic that tells CBMC what function a
5246
function pointer points to.
@@ -58,8 +52,7 @@ could be called. This results in a combinatorial explosion of paths when
5852
CBMC is exploring the program. It would be useful to have an annotation
5953
that tells CBMC exactly what function a pointer is expected to point to.
6054
61-
62-
### Generalize loop-unwinding bounds to cover multiple loops
55+
## Generalize loop-unwinding bounds to cover multiple loops
6356
6457
**Task**: allow users to specify a *combined* bound for the sum of
6558
several loop counters.
@@ -76,42 +69,320 @@ specify that the *combined* loop bound for the three loops is `S`.
7669
7770
The current syntax for the `--unwindset` switch is
7871
79-
```
72+
```sh
8073
--unwindset LABEL:BOUND
8174
```
8275

8376
You might like to generalize this so that it looks like
8477

85-
```
78+
```sh
8679
--unwindset LABEL<,LABEL2,LABEL3,...>:BOUND
8780
```
8881

8982
which would have the effect that the loops with those labels would all
9083
share a bound.
9184

9285
*Hints*:
93-
- The current member that stores how many times a loop has been unwound
86+
87+
* The current member that stores how many times a loop has been unwound
9488
is `goto_symex_statet::loop_iterations`. Have a look at where this
9589
member is accessed, and what is done with that value (e.g. read
9690
through `symex_bmct::should_stop_unwind()` and
9791
`symex_bmct::loop_unwind_handlert`).
9892

99-
- You may wish to add a map from a *set* of loop names to a loop bound,
93+
* You may wish to add a map from a *set* of loop names to a loop bound,
10094
in addition to the current map from single loop names to loop bounds.
10195

102-
- If a loop has an individual bound, and is also part of a set of
96+
* If a loop has an individual bound, and is also part of a set of
10397
mutually-bound loops, then we should stop unwinding it if *either* of
10498
those bounds is reached. Good, thoughtful test cases are essential
10599
here!
106100

107-
Completed
108-
---------
101+
## Instrumentation to Check Pointer Alignment
102+
103+
Some architectures only allow load instructions / pointer dereferencing when the
104+
address is a multiple of the word length (i.e. 4 bytes or 8 bytes). This project
105+
would add a new instrumentation option for `cbmc` and `goto-instrument` that checks
106+
each pointer before dereference and makes sure it's correctly aligned.
107+
108+
This is a good way to get into `goto_check` and how instrumentation of asserts
109+
works.
110+
111+
## Command line options to SAT solvers
112+
113+
There are options for which SMT solver to run and there are ways of building with
114+
multiple SAT solvers but no command line options to pick a SAT solver. It would be
115+
nice if there were.
116+
117+
## Combine Uninitialized Variables Analysis with Alias Analysis
118+
119+
`goto-instrument` features an instrumentation to track cases of use-before-define,
120+
i.e. reading of uninitialised variables. At present, the analysis does not take
121+
into account aliasing information. As such, the following code will yield a spurious
122+
counterexample:
123+
124+
``` C
125+
int main()
126+
{
127+
int x;
128+
int *ptr;
129+
ptr=&x;
130+
*ptr=42;
131+
if(x < 42)
132+
return 1;
133+
return 0;
134+
}
135+
```
136+
137+
This should be pretty straight-forward to do with `goto-analyzer --vsd` and might even
138+
be a good intro to that code.
139+
140+
## Make SAMATE Tests Work
141+
142+
Read [this paper](https://www.sciencedirect.com/science/article/abs/pii/S0950584913000384) and
143+
make the tests mentioned in there work.
144+
145+
## Error Explanation
146+
147+
Re-implement the algorithm described in [this paper](http://www.kroening.com/papers/STTT-error-explanation-2005.pdf)
148+
149+
## LTL
150+
151+
Build an instrumenter in `goto-instrument` for checking LTL (or even CTL) properties
152+
on C code. E.g look at [this paper](https://www.umsec.umn.edu/publications/Partial-Translation-Verification-Untrusted-Code)
153+
154+
Please get in touch with Michael Tautschnig, who has done very similar work in FShell.
155+
156+
## Test SMT Back Ends
157+
158+
CBMC allows to choose among a selection of SMT solvers to use as solving back end.
159+
As these tools evolve, CBMC's code for running them may need a fresh look from time
160+
to time. After a first round of manual experiments, these tests should become part
161+
of the routine build process on `dkr-build`. (Michael Tautschnig knows how to do this.)
162+
163+
## Improved Interpreter
164+
165+
CProver has an interpreter for goto programs but it could be improved, for example adding
166+
support for dynamic analysis, calls to external functions and possibly even mixed concrete
167+
and symbolic execution. It would also be good to be able to interpret traces found by BMC
168+
to catch mismatches / bugs between the logical model and the bit blasting models. This
169+
would also be useful for more advanced CEGAR like techniques (also see below).
170+
171+
## "KLEE-PROVER"
172+
173+
Alter the CBMC symbolic execution engine to create a dynamic symbolic execution tool
174+
similar to [KLEE](https://klee.github.io/). Possible features include:
175+
176+
* Distributed and incremental support using the caching scheme here
177+
* A feature to run BMC as you go to explore the traces 'around' the execution traces
178+
* Use path exploration as the witness generation part of an ACDL process.
179+
* Perform precondition inference, possibly using ACDL.
180+
181+
## Interact with GDB
182+
183+
GDB implements a client-server protocol with a GDB instance acting as a server and the
184+
GUI as a client. (The documentation for that protocol may be found [here](http://www.sourceware.org/gdb/current/onlinedocs/gdb/GDB_002fMI.html).)
185+
By implementing this protocol in various places within CProver, there are various
186+
things that could be achieved.
187+
188+
To do some of these will require working out which functions are non-deterministic models
189+
of real functions and accounting for this. This would also be necessary for some of the
190+
preceeding ideas.
191+
192+
### Trace as a GDB server
193+
194+
Implement a GDB server that uses a trace to 'execute' the program. This allows any GUI
195+
debugger that can talk to GDB to be used to graphically explore the trace. Particularly
196+
valuable for concurrent programs in which just reading the trace could be very hard.
197+
198+
An implementation of that is in the trace debugger here: https://github.com/diffblue/eclipse-cbmc
199+
200+
Handling the SV-COMP counter-example format (ask Michael) would probably make this tool
201+
a lot more widely usable.
202+
203+
### Trace as a GDB client
204+
205+
Conversely, implement a GDB client that uses the trace to control a debugger. If
206+
connected to a 'real' GDB then this would allow the debugger to be set to the position
207+
that shows the error. In the case of parallel programs this avoids the need to create
208+
custom schedulers, etc. and over-all it is a much better interface – rather than
209+
generate a trace, CProver simply gives you the debugger in the state that triggers
210+
the problem.
211+
212+
One option would be to convert the trace to a GDB script but this misses some of
213+
the value as it would be possible to query they GDB server as the program runs to
214+
check that values really are assigned correctly. This will catch the first point
215+
at which the model and the real system diverge and allows checked 'resimulation'
216+
of the traces on a 'real' platform. If the goto-binary contains the ELF binary
217+
then this could be used directly.
218+
219+
### Interpreter as a GDB server
220+
221+
An enabling tech, this would allow us to run the interpreter as a debugger back
222+
end. Combine with the trace as a GDB client, this would allow resimulation of
223+
traces, which would be useful for catching encoding bugs in CProver and ideally
224+
should be run after the decode from the solver all of the time.
225+
226+
Michael has a partial implementation of this. (?)
227+
228+
### Recording GDB server
229+
230+
Acts as a GDB server and passes commands back to another GDB server to actually
231+
execute. Records the state changes and locations visited. This allows a test case
232+
to be 'run' and the trace / conditions on the trace to be collected and (after
233+
generalisation) be used for test case driven analysis / test case generalisation
234+
/ something “concolic” like.
235+
236+
### Differential GDB server
237+
238+
Acts as a GDB server and then connects (as a client) to one or more GDB servers.
239+
Commands are executed on all of these and then the state / position compared
240+
(much of the infrastructure of the previous idea could be used). Divergences of
241+
behaviour can then be found, particularly root causes can be identified by the
242+
first divergence. This may not be CProver specific but can likely be used for
243+
various projects and ideas above.
244+
245+
### "Concolic" GDB client
246+
247+
An interactive component that runs a program via a GDB server and query / interact
248+
with it. This would be useful for building a “concolic” tool, especially for running
249+
library functions that are not defined.
250+
251+
## Other ideas
252+
253+
Valgrind and the Linux kernel both have GDB servers - what could we do with these?
254+
255+
### Static Detection of Data Races
256+
257+
It would be good to have a tool that checks for data races in a similar fashion to
258+
DRD / Helgrind. The current pointer analysis can over approximate shared variables,
259+
so possibly some combination of this, trace generation, DRD / Helgrind and then
260+
refinement of which traces to pick? Perhaps this is even something ACDL like?
261+
262+
### CORAL / Q
263+
264+
Build something similar to the Q tool in SLAM. (?)
265+
266+
### k-liveness
267+
268+
Implement [k-liveness](http://www.cs.utexas.edu/~hunt/FMCAD/FMCAD12/013.pdf)
269+
for software-liveness checking.
270+
271+
### Known Bugs on the Web
272+
273+
Turn the list of `KNOWNBUG`S from the CBMC regressions into a list of open issues
274+
in the CBMC repository issue tracker (https://github.com/diffblue/cbmc/issues).
275+
276+
If you do this, make sure you target either the`develop` branch or the latest
277+
CBMC release.
278+
279+
### Coding Standards Checker
280+
281+
Build instrumenters for various coding standards, e.g. AutoSar, [MISRA-C](https://en.wikipedia.org/wiki/MISRA_C),
282+
CERT-C, [HIC++](https://www.perforce.com/resources/qac/high-integrity-cpp-coding-standard),
283+
etc.
284+
285+
### Front-End Testing
286+
287+
Run the regressions mentioned [here](https://github.com/kframework/c-semantics/tree/master/tests)
288+
and the [GCC C Torture tests](https://gcc.gnu.org/onlinedocs/gccint/C-Tests.html)
289+
with `goto-cc` and a `goto-program` interpreter.
290+
291+
### Unspecified C Behaviours
292+
293+
Build instrumentation to check for unspecified behaviours in C, as per Section J.2
294+
(and maybe also J.3) of the C standard. Some of them may be possible to handle
295+
entirely in the front-end (for example, catching `x = x++` as modifying a variable
296+
twice between sequence points could be done in side effect removal in ansi-c), while
297+
others may be best to be instrumentation (for example handling the unspecified nature
298+
of the order of evaluation of function arguments).
299+
300+
For more information, talk to Michael Tautschnig for this item.
301+
302+
### Reporting Software Errors with CWE Information
303+
304+
The [Common Weakness Enumeration project](http://cwe.mitre.org/) seeks to classify
305+
software weaknesses and errors in a common dictionary. Reporting CWE information together
306+
with failed assertions, such as [Free of Pointer not at Start of Buffer](http://cwe.mitre.org/data/definitions/761.html),
307+
would help both users of the tools as well as industrial partners. In the first instance,
308+
errors already caught by present instrumentation should be augmented with additional
309+
information. In a second step, instrumentation for other problems listed in the CWE
310+
database should be developed.
311+
312+
### Machine Learn Refinement settings
313+
314+
The refinement option enables a tactic of under and over approximation to try to
315+
solve the generated formulae more quickly. There are a large number of possible
316+
options – which operations to approximate, how much to under or over approximate
317+
them, how much to add back if this doesn't work, etc. Rather than trying to manually
318+
pick the best strategy from these, it would be good to machine learn them.
319+
320+
### Cluster CBMC
321+
322+
Most users of CBMC have access to multi-core machines; many have clusters available.
323+
There are a number of different approaches and a number of different levels at which
324+
the task could be parallelised; per assertion, portfolio approaches of different
325+
back-ends, parallel solvers, etc. It would be good to be able to exploit these.
326+
The caching approach suggested under “KLEE-PROVER” might be an easy route to
327+
implementing this.
328+
329+
### x86 to Goto
330+
331+
Look at various efforts to model x86 assembler, e.g., Andrew Kennedy,
332+
“Formalizing .EXEs, .DLLs, and all that”, Gang Tan, “Reusable tools for formal
333+
modeling of machine code”, and check whether this can turn into a translator
334+
from x86 to goto binary.
335+
336+
Also look at Google's Native Client, and talk to Matt, who has tried Valgrind.
337+
338+
Furthermore, talk to Michael, who has got a student who made some progress
339+
on this.
340+
341+
And maybe have a read of Automated synthesis of symbolic instruction encodings
342+
from I/O samples.
343+
344+
### Branch Prediction
345+
346+
Implement branch prediction algorithms in path-symex.
347+
348+
### GCC Torture Tests
349+
350+
GCC has a set of 'torture tests' which are intended to stress test and find bugs in their
351+
parser and front-end. It would be good to run these through goto-cc / add them to the
352+
nightly testing and fix any bugs that arise. (Possible overlap with “Front-End Testing”
353+
above.). The same is probably also true for other compiler fuzzing tools.
354+
355+
### Compiler loop optimisations for BMC
356+
357+
Try out some of the compiler loop optimisations (hoisting invariant code,
358+
fusing sequential independent loops, etc.) and see if they improve BMC,
359+
K-Induction or loop acceleration. These should be possible to implement in
360+
goto-instrument.
361+
362+
### Handling restrict in a smart way
363+
364+
C99 adds the `restrict` keyword to mark unaliased pointers. This can be discarded by
365+
the compiler and thus has no semantics, it is only an annotation. However, there are
366+
a number of thing we could do with:
367+
368+
1. Insert checks in calling functions to make sure that the pointers really are unaliased.
369+
2. Add an instrumentation pass that adds assumptions that restricted pointers are not
370+
aliased. This may imporve the quality of counter-examples (although at the cost of
371+
potentially missing some bugs, although not if combined with the previous technique).
372+
373+
### C++ Exceptions
109374

375+
C++ has annotations for what exceptions a function can throw. It would be good to
376+
be able to check / generate / minimise these and it should be possible to do this
377+
with a reasonably simple abstract domain. The CVC4 project would be interested in
378+
using this if it exists.
110379

111-
### Add a `--print-internal-representation` switch
380+
An old prototype of something similar is here: https://github.com/peterschrammel/cbmc/commits/progres-exceptions
112381

113-
*Summary*: This switch should provide more detail than
114-
`--show-goto-functions`, by printing out the `irep` of every program
115-
instruction instead of a concise representation.
382+
### Java Memory Model
116383

117-
*Completed in* PR #991
384+
CBMC's Java front-end is currently in development and will support the full Java
385+
bytecode instruction set soon. Beyond that, it would be great to support some more
386+
advanced aspects of the Java programming language semantics, such as the Java Memory
387+
Model. This will include adding models for a subset of the Java concurrency packages
388+
in the Java Runtime Library.

0 commit comments

Comments
 (0)