Skip to content

The C standard formalized in Coq #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RalfJung opened this issue Sep 30, 2016 · 1 comment
Closed

The C standard formalized in Coq #142

RalfJung opened this issue Sep 30, 2016 · 1 comment
Labels
C-related-work Category: Discussion of related work (other languages, scientific publications, ...)

Comments

@RalfJung
Copy link
Member

Link: http://robbertkrebbers.nl/thesis.html

This PhD thesis attempts to formalize the sequential (single-threaded) part of the C11 standard. The core contribution is a memory model that can explain the behavior of mixing "low-level" (byte-wise) and "high-level" C -- which kinds of pointer arithmetic and pointer casts are allowed, what exactly happens with pointers to field of unions as the union is overwritten, and so on. Type-based alias analysis is formalized and proven correct. A few questions remain unanswered, mostly related to pointer-to-integer casts (also see #30).

Some examples discussed in the thesis include:

union int_or_short { int x; short y; } u = { .y = 3 };
int *p = &u.x;
// p points to the x variant of u
short *q = &u.y; // q points to the y variant of u

short z = *q;
// u has variant y and is accessed through y -> OK
*p = 10;
// u has variant y and is accessed through x -> bad
union int_or_short { int x; short y; } u = { .x = 3 };
printf("%d\n", u.y); // this is legal, because of type punning
union int_or_short { int x; short y; } u = { .x = 3 };
short *p = &u.y;
printf("%d\n", *p); // this is bad, type punning does not apply

Representation of pointers

Crucially, pointers in this model are not numbers. They are paths in a tree that define how to traverse memory, from the root of a block, through the fields of structs and unions and the indices in an array to the destination of the pointer.
See, for example, the image on page 72: The pointer to s.u.x[1] is the path that starts at the root, picks the u field of the struct, then the x field of the union, then the index 1 in the array. This way, the actual layout of structs and unions can remain uncertain, and pointer arithmetic that goes beyond array bounds becomes UB.

Relevance for Rust

Hopefully, we won't have type-based alias analysis or type punning in Rust. Still, some parts of the model may remain relevant: We have to decide how much pointer arithmetic we allow. The exact layout of structs and enums is left to the compiler (and we want to have optimizations like representing Option<Box> without a discriminant), so we probably want pointer arithmetic that makes assumptions about this layout to be UB. The tree-based addressing could be used to appropriately model this underspecification.

(Sorry my summary is not as good as the ones Niko wrote... )

@RalfJung RalfJung transferred this issue from rust-lang/rust-memory-model Jun 18, 2019
@RalfJung RalfJung added the C-related-work Category: Discussion of related work (other languages, scientific publications, ...) label Jun 18, 2019
@JakobDegen
Copy link
Contributor

Closing along with the other related work issues. Unclear value as an issue, may be useful to link from elsewhere in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-related-work Category: Discussion of related work (other languages, scientific publications, ...)
Projects
None yet
Development

No branches or pull requests

2 participants