-
Notifications
You must be signed in to change notification settings - Fork 59
Be more explicit about the layout guarantees of integer and floating-point types #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
02652ba
Be more explicit about the layout guarantees of integer and floating-…
gnzlbg b6d30c5
Re-word the whole document; add two FIXME's
gnzlbg 8c55002
fixup
gnzlbg 559d1a5
Fix mdbook; remove FIXMEs
gnzlbg 46b71b7
Rename scalar to scalars
gnzlbg b732072
Language
gnzlbg d8ca372
Fix usize limitations
gnzlbg a8f3ed1
Remove the implementation-defined bit and pieces
gnzlbg 712ca73
note that usize::MAX arrays -> ZSTs
gnzlbg ac5fe2b
emphasize value
gnzlbg 3c4fb07
Replace values with allocations
gnzlbg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# Layout of scalar types | ||
|
||
This chapter represents the consensus from issue [#9]. It documents the memory | ||
layout and considerations for `bool`, `char`, floating point types (`f{32, 64}`), and integral types (`{i,u}{8,16,32,64,128,size}`). | ||
|
||
These types are all scalar types, representing a single value, and have no | ||
layout `#[repr()]` flags. | ||
|
||
[#9]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9 | ||
|
||
## `bool` | ||
|
||
Rust's `bool` has the same layout as C17's` _Bool`, that is, its size and | ||
alignment are implementation-defined. Any `bool` can be cast into an integer, | ||
taking on the values 1 (`true`) or 0 (`false`). | ||
|
||
> **Note**: on all platforms that Rust's currently supports, its size and | ||
> alignment are 1, and its ABI class is `INTEGER` - see [Rust Layout and ABIs]. | ||
|
||
[Rust Layout and ABIs]: https://gankro.github.io/blah/rust-layouts-and-abis/#the-layoutsabis-of-builtins | ||
|
||
## `char` | ||
|
||
Rust char is 32-bit wide and represents an [unicode scalar value]. The alignment | ||
of `char` is _implementation-defined_. | ||
|
||
[unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value | ||
|
||
> **Note**: Rust `char` type is not layout compatible with C / C++ `char` types. | ||
> The C / C++ `char` types correspond to either Rust's `i8` or `u8` types on all | ||
> currently supported platforms, depending on their signedness. Rust does not | ||
> support C platforms in which C `char` is not 8-bit wide. | ||
|
||
## `isize` and `usize` | ||
|
||
The `isize` and `usize` types are pointer-sized signed and unsigned integers. | ||
They have the same layout as the [pointer types] for which the pointee is | ||
`Sized`, and are layout compatible with C's `uintptr_t` and `intptr_t` types. | ||
|
||
> **Note**: Rust's `usize` and C's `unsigned` types are **not** equivalent. C's | ||
> `unsigned` is at least as large as a short, allowed to have padding bits, etc. | ||
> but it is not necessarily pointer-sized. | ||
|
||
The layout of `usize` determine the following: | ||
|
||
- the maximum size of Rust values is _implementation-defined_, but can at most | ||
be `usize::max_value` since `mem::size_of` and `mem::size_of_val` return | ||
`usize`, | ||
- the maximum number of elements in an array is _implementation-defined_, but | ||
can at most be `usize::max_value()` since `[T; N: usize]`, | ||
- the maximum value by which a pointer can be offseted is | ||
_implementation-defined_, but can at most be `usize::max_value()` since | ||
`ptr.add(count: usize)`. | ||
|
||
> **Note**: in the current Rust implementation: | ||
> | ||
> * the maximum size of Rust values is limited to `isize::max_value()`. The LLVM | ||
> `getelementptr` instruction uses signed-integer field offsets. Rust calls | ||
> `getelementptr` with the `inbounds` flag which assumes that field offsets do | ||
> not overflow, | ||
> * the maximum number of elements in an array is `usize::max_value()`, | ||
gnzlbg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
> * the maximum value by which a pointer can be offseted is `usize::max_value()`. | ||
|
||
[pointer types]: ./pointers.md | ||
|
||
## Fixed-width integer types | ||
|
||
Rust's signed and unsigned fixed-width integer types `{i,u}{8,16,32,64}` have | ||
the same layout as the C fixed-width integer types from the `<stdint.h>` header | ||
`{u,}int{8,16,32,64}_t`. That is: | ||
|
||
* these types have no padding bits, | ||
* their size exactly matches their bit-width, | ||
* negative values of signed integer types are represented using 2's complement. | ||
|
||
This properties also hold for Rust's 128-bit wide `{i,u}128` integer types, but | ||
C does not expose equivalent types in `<stdint.h>`. | ||
|
||
Rust fixed-width integer types are therefore safe to use directly in C FFI where | ||
the corresponding C fixed-width integer types are expected. | ||
|
||
### Layout compatibility with C native integer types | ||
|
||
The specification of native C integer types, `char`, `short`, `int`, `long`, | ||
... as well as their `unsigned` variants, guarantees a lower bound on their size, | ||
e.g., `short` is _at least_ 16-bit wide and _at least_ as wide as `char`. | ||
|
||
Their exact sizes are _implementation-defined_. | ||
|
||
Libraries like `libc` use knowledge of this _implementation-defined_ behavior on | ||
each platform to select a layout-compatible Rust fixed-width integer type when | ||
interfacing with native C integer types (e.g. `libc::c_int`). | ||
|
||
> **Note**: Rust does not support C platforms on which the C native integer type | ||
> are not compatible with any of Rust's fixed-width integer type (e.g. because | ||
> of padding-bits, lack of 2's complement, etc.). | ||
|
||
## Fixed-width floating point types | ||
|
||
Rust's `f32` and `f64` single (32-bit) and double (64-bit) precision | ||
floating-point types have [IEEE-754] `binary32` and `binary64` floating-point | ||
layouts, respectively. | ||
|
||
When the platforms' `"math.h"` header defines the `__STDC_IEC_559__` macro, | ||
Rust's floating-point types are safe to use directly in C FFI where the | ||
appropriate C types are expected (`f32` for `float`, `f64` for `double`). | ||
|
||
If the C platform's `"math.h"` header does not define the `__STDC_IEC_559__` | ||
macro, whether using `f32` and `f64` in C FFI is safe or not for which C type is | ||
_implementation-defined_. | ||
|
||
> **Note**: the `libc` crate uses knowledge of each platform's | ||
> _implementation-defined_ behavior to provide portable `libc::c_float` and | ||
> `libc::c_double` types that can be used to safely interface with C via FFI. | ||
|
||
[IEEE-754]: https://en.wikipedia.org/wiki/IEEE_754 |
This file was deleted.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that we use "value" to mean a very particular thing here -- the same thing that C calls "object", from what I can tell. Also see #40.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A relatively easy way to avoid that issue while also being less jargon-intense would be to talk about the sizes of allocations. Just risks people thinking about heap allocation to the exclusion of all other kinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RalfJung once #40 is merged we should probably start linking these terms to their definitions (EDIT: i've added emphasis to the "values", but that doesn't help much).
@rkruppe with #40 we might be talking about the size of "places", which would remove the ambiguity with heap allocations, but it is still jargon heavy.