diff --git a/reference/src/SUMMARY.md b/reference/src/SUMMARY.md index db593e02..b3cc4129 100644 --- a/reference/src/SUMMARY.md +++ b/reference/src/SUMMARY.md @@ -4,9 +4,11 @@ - [Data layout](./layout.md) - [Structs and tuples](./layout/structs-and-tuples.md) - - [Integers and Floating Points](./layout/integers-floatingpoint.md) + - [Scalars](./layout/scalars.md) - [Enums](./layout/enums.md) - [Unions](./layout/unions.md) + - [Pointers](./layout/pointers.md) + - [Function pointers](./layout/function-pointers.md) - [Arrays and Slices](./layout/arrays-and-slices.md) - [Packed SIMD vectors](./layout/packed-simd-vectors.md) - [Optimizations](./optimizations.md) diff --git a/reference/src/layout.md b/reference/src/layout.md deleted file mode 100644 index cb1f7674..00000000 --- a/reference/src/layout.md +++ /dev/null @@ -1 +0,0 @@ -# Data layout diff --git a/reference/src/layout/function-pointers.md b/reference/src/layout/function-pointers.md index 89785261..9869ebe9 100644 --- a/reference/src/layout/function-pointers.md +++ b/reference/src/layout/function-pointers.md @@ -81,6 +81,11 @@ bool for_all(struct Cons const *self, bool (*func)(int, void *), void *thunk); ``` ```rust +# use std::{ +# ffi::c_void, +# os::raw::c_int, +# }; +# pub struct Cons { data: c_int, next: Option>, @@ -117,9 +122,6 @@ pub extern "C" fn for_all( } it = node.next.as_ref().map(|x| &**x); } + true } ``` - -### Unresolved Questions - -- dunno diff --git a/reference/src/layout/integers-floatingpoint.md b/reference/src/layout/integers-floatingpoint.md deleted file mode 100644 index 6dda3508..00000000 --- a/reference/src/layout/integers-floatingpoint.md +++ /dev/null @@ -1,61 +0,0 @@ -# Layout of Boolean, Floating Point, and Integral Types -This chapter represents the consensus from issue [#9]. It documents the memory layout and considerations for `bool`, `usize`, `isize`, floating point types, and integral types. - -[#9]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9 - -## Overview -These are all scalar types, representing a single value. These types have no layout variants (no `#[repr(C)]` or `#[repr(Rust)]`). Their size is fixed and well-defined across FFI boundaries and map to their corresponding integral types in the C ABI. -- `bool`: 1 byte - - any `bool` can be cast into an integer, taking on the values 1 (true) or 0 (false) -- `usize`, `isize`: pointer-sized unsigned/signed integer type -- `u8` .. `u128`, `i8` .. `i128` - - {8, 16, 32, 64, 128}-bit unsigned integer - - {8, 16, 32, 64, 128}-bit signed integer -- `f32`, `f64` - - IEEE floats - - 32-bit or 64-bit -- `char` - - C++ char: equivalent to either `i8`/`u8` - - Rust char: 32-bit - - not ABI compatible - - represents [Unicode scalar value](http://www.unicode.org/glossary/#unicode_scalar_value) - -## `usize`/`isize` -Types `usize` and `isize` are committed to having the same size as a native pointer on the platform. The layout of `usize` determines the following: -- how much a pointer of a certain type can be offseted, -- the maximum size of Rust objects (because size_of/size_of_val return `usize`), -- the maximum number of elements in an array (`[T; N: usize]`), -- `usize`/`isize` in C FFI are compatible with C's `uintptr_t` / `intptr_t` (and have the same size and alignment). - -The maximum size of any single value must fit within `usize` to [ensure that pointer diff is representable](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212703192). - -`usize` and C’s `unsized` are *not* equivalent. - -## Booleans -Rust's `bool` has the same layout as C17's` _Bool`, that is, its size and alignment are implementation-defined. - -Note: on all platforms that Rust's currently supports, the size and alignment of bool are 1, and its ABI class is INTEGER. - -For full ABI compatibility details, see [Gankro’s post](https://gankro.github.io/blah/rust-layouts-and-abis/#the-layoutsabis-of-builtins). - -## Relationship to C integer hierarchy -C integers: -- char: at least 8 bits -- short: at least 16 bits (also at least a char) -- int: at least a short (intended to be a native integer size) -- long: at least 32 bits (also at least an int) -- long long: at least 64 bits (also at least a long) -The C integer types specify a minimum size, but not the exact size. For this reason, Rust integer types are not necessarily compatible with the “corresponding” C integer type. Instead, use the corresponding fixed size data types (e.g. `i64` in Rust would correspond to `int64_t` in C). - -## Controversies -There has been some debate about what to pick as the "official" behavior for bool: -* Rust does what C does (this is what the lang team decided) - * and in all cases you care about, that is 1 byte that is 0 or 1 -or -* Rust makes it 1 byte with values 0 or 1 - * and in all cases you care about, this is what C does - -Related discussions: [document the size of bool](https://github.com/rust-lang/rust/pull/46156), [bool== _Bool?](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/53#issuecomment-447050232), [bool ABI](https://github.com/rust-lang/rust/pull/46176#issuecomment-359593446) - - - diff --git a/reference/src/layout/pointers.md b/reference/src/layout/pointers.md index ef9abdb0..160ffbb3 100644 --- a/reference/src/layout/pointers.md +++ b/reference/src/layout/pointers.md @@ -36,7 +36,7 @@ multi-trait objects `&(dyn T + U)` or references to other dynamically sized type other than that they are at least word-aligned, and have size at least one word. The layout of `&dyn T` when `T` is a trait is the same as that of: -```rust +```rust,ignore #[repr(C)] struct DynObject { data: *u8, @@ -45,7 +45,7 @@ struct DynObject { ``` The layout of `&[T]` is the same as that of: -```rust +```rust,ignore #[repr(C)] struct Slice { ptr: *T, diff --git a/reference/src/layout/scalars.md b/reference/src/layout/scalars.md new file mode 100644 index 00000000..17736de8 --- /dev/null +++ b/reference/src/layout/scalars.md @@ -0,0 +1,114 @@ +# Layout of scalar types + +This chapter represents the consensus from issue [#9]. It documents the memory +layout and considerations for `bool`, `char`, floating point types (`f{32, 64}`), and integral types (`{i,u}{8,16,32,64,128,size}`). + +These types are all scalar types, representing a single value, and have no +layout `#[repr()]` flags. + +[#9]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9 + +## `bool` + +Rust's `bool` has the same layout as C17's` _Bool`, that is, its size and +alignment are implementation-defined. Any `bool` can be cast into an integer, +taking on the values 1 (`true`) or 0 (`false`). + +> **Note**: on all platforms that Rust's currently supports, its size and +> alignment are 1, and its ABI class is `INTEGER` - see [Rust Layout and ABIs]. + +[Rust Layout and ABIs]: https://gankro.github.io/blah/rust-layouts-and-abis/#the-layoutsabis-of-builtins + +## `char` + +Rust char is 32-bit wide and represents an [unicode scalar value]. The alignment +of `char` is _implementation-defined_. + +[unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value + +> **Note**: Rust `char` type is not layout compatible with C / C++ `char` types. +> The C / C++ `char` types correspond to either Rust's `i8` or `u8` types on all +> currently supported platforms, depending on their signedness. Rust does not +> support C platforms in which C `char` is not 8-bit wide. + +## `isize` and `usize` + +The `isize` and `usize` types are pointer-sized signed and unsigned integers. +They have the same layout as the [pointer types] for which the pointee is +`Sized`, and are layout compatible with C's `uintptr_t` and `intptr_t` types. + +> **Note**: Rust's `usize` and C's `unsigned` types are **not** equivalent. C's +> `unsigned` is at least as large as a short, allowed to have padding bits, etc. +> but it is not necessarily pointer-sized. + +> **Note**: in the current Rust implementation, the layouts of `isize` and +> `usize` determine the following: +> +> * the maximum size of Rust _allocations_ is limited to `isize::max_value()`. +> The LLVM `getelementptr` instruction uses signed-integer field offsets. Rust +> calls `getelementptr` with the `inbounds` flag which assumes that field +> offsets do not overflow, +> +> * the maximum number of elements in an array is `usize::max_value()` (`[T; N: +> usize]`. Only ZST arrays can probably be this large in practice, non-ZST +> arrays are bound by the maximum size of Rust values, +> +> * the maximum value by which a pointer can be offseted using `ptr.add(count: +> usize)` is `usize::max_value()`. +> +> These limits have not gone through the RFC process and are not guaranteed to +> hold. + +[pointer types]: ./pointers.md + +## Fixed-width integer types + +Rust's signed and unsigned fixed-width integer types `{i,u}{8,16,32,64}` have +the same layout as the C fixed-width integer types from the `` header +`{u,}int{8,16,32,64}_t`. That is: + +* these types have no padding bits, +* their size exactly matches their bit-width, +* negative values of signed integer types are represented using 2's complement. + +This properties also hold for Rust's 128-bit wide `{i,u}128` integer types, but +C does not expose equivalent types in ``. + +Rust fixed-width integer types are therefore safe to use directly in C FFI where +the corresponding C fixed-width integer types are expected. + +### Layout compatibility with C native integer types + +The specification of native C integer types, `char`, `short`, `int`, `long`, +... as well as their `unsigned` variants, guarantees a lower bound on their size, +e.g., `short` is _at least_ 16-bit wide and _at least_ as wide as `char`. + +Their exact sizes are _implementation-defined_. + +Libraries like `libc` use knowledge of this _implementation-defined_ behavior on +each platform to select a layout-compatible Rust fixed-width integer type when +interfacing with native C integer types (e.g. `libc::c_int`). + +> **Note**: Rust does not support C platforms on which the C native integer type +> are not compatible with any of Rust's fixed-width integer type (e.g. because +> of padding-bits, lack of 2's complement, etc.). + +## Fixed-width floating point types + +Rust's `f32` and `f64` single (32-bit) and double (64-bit) precision +floating-point types have [IEEE-754] `binary32` and `binary64` floating-point +layouts, respectively. + +When the platforms' `"math.h"` header defines the `__STDC_IEC_559__` macro, +Rust's floating-point types are safe to use directly in C FFI where the +appropriate C types are expected (`f32` for `float`, `f64` for `double`). + +If the C platform's `"math.h"` header does not define the `__STDC_IEC_559__` +macro, whether using `f32` and `f64` in C FFI is safe or not for which C type is +_implementation-defined_. + +> **Note**: the `libc` crate uses knowledge of each platform's +> _implementation-defined_ behavior to provide portable `libc::c_float` and +> `libc::c_double` types that can be used to safely interface with C via FFI. + +[IEEE-754]: https://en.wikipedia.org/wiki/IEEE_754 diff --git a/reference/src/optimizations.md b/reference/src/optimizations.md deleted file mode 100644 index b3d77f39..00000000 --- a/reference/src/optimizations.md +++ /dev/null @@ -1 +0,0 @@ -# Optimizations