You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PhD thesis attempts to formalize the sequential (single-threaded) part of the C11 standard. The core contribution is a memory model that can explain the behavior of mixing "low-level" (byte-wise) and "high-level" C -- which kinds of pointer arithmetic and pointer casts are allowed, what exactly happens with pointers to field of unions as the union is overwritten, and so on. Type-based alias analysis is formalized and proven correct. A few questions remain unanswered, mostly related to pointer-to-integer casts (also see #30).
Some examples discussed in the thesis include:
unionint_or_short { intx; shorty; } u= { .y=3 };
int*p=&u.x;
// p points to the x variant of ushort*q=&u.y; // q points to the y variant of ushortz=*q;
// u has variant y and is accessed through y -> OK*p=10;
// u has variant y and is accessed through x -> bad
unionint_or_short { intx; shorty; } u= { .x=3 };
printf("%d\n", u.y); // this is legal, because of type punning
unionint_or_short { intx; shorty; } u= { .x=3 };
short*p=&u.y;
printf("%d\n", *p); // this is bad, type punning does not apply
Representation of pointers
Crucially, pointers in this model are not numbers. They are paths in a tree that define how to traverse memory, from the root of a block, through the fields of structs and unions and the indices in an array to the destination of the pointer.
See, for example, the image on page 72: The pointer to s.u.x[1] is the path that starts at the root, picks the u field of the struct, then the x field of the union, then the index 1 in the array. This way, the actual layout of structs and unions can remain uncertain, and pointer arithmetic that goes beyond array bounds becomes UB.
Relevance for Rust
Hopefully, we won't have type-based alias analysis or type punning in Rust. Still, some parts of the model may remain relevant: We have to decide how much pointer arithmetic we allow. The exact layout of structs and enums is left to the compiler (and we want to have optimizations like representing Option<Box> without a discriminant), so we probably want pointer arithmetic that makes assumptions about this layout to be UB. The tree-based addressing could be used to appropriately model this underspecification.
(Sorry my summary is not as good as the ones Niko wrote... )
The text was updated successfully, but these errors were encountered:
Link: http://robbertkrebbers.nl/thesis.html
This PhD thesis attempts to formalize the sequential (single-threaded) part of the C11 standard. The core contribution is a memory model that can explain the behavior of mixing "low-level" (byte-wise) and "high-level" C -- which kinds of pointer arithmetic and pointer casts are allowed, what exactly happens with pointers to field of unions as the union is overwritten, and so on. Type-based alias analysis is formalized and proven correct. A few questions remain unanswered, mostly related to pointer-to-integer casts (also see #30).
Some examples discussed in the thesis include:
Representation of pointers
Crucially, pointers in this model are not numbers. They are paths in a tree that define how to traverse memory, from the root of a block, through the fields of structs and unions and the indices in an array to the destination of the pointer.
See, for example, the image on page 72: The pointer to
s.u.x[1]
is the path that starts at the root, picks theu
field of the struct, then thex
field of the union, then the index 1 in the array. This way, the actual layout of structs and unions can remain uncertain, and pointer arithmetic that goes beyond array bounds becomes UB.Relevance for Rust
Hopefully, we won't have type-based alias analysis or type punning in Rust. Still, some parts of the model may remain relevant: We have to decide how much pointer arithmetic we allow. The exact layout of structs and enums is left to the compiler (and we want to have optimizations like representing
Option<Box>
without a discriminant), so we probably want pointer arithmetic that makes assumptions about this layout to be UB. The tree-based addressing could be used to appropriately model this underspecification.(Sorry my summary is not as good as the ones Niko wrote... )
The text was updated successfully, but these errors were encountered: