Skip to content

Be more explicit about the layout guarantees of integer and floating-point types #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 11, 2019
4 changes: 3 additions & 1 deletion reference/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@

- [Data layout](./layout.md)
- [Structs and tuples](./layout/structs-and-tuples.md)
- [Integers and Floating Points](./layout/integers-floatingpoint.md)
- [Scalars](./layout/scalars.md)
- [Enums](./layout/enums.md)
- [Unions](./layout/unions.md)
- [Pointers](./layout/pointers.md)
- [Function pointers](./layout/function-pointers.md)
- [Arrays and Slices](./layout/arrays-and-slices.md)
- [Packed SIMD vectors](./layout/packed-simd-vectors.md)
- [Optimizations](./optimizations.md)
Expand Down
1 change: 0 additions & 1 deletion reference/src/layout.md

This file was deleted.

10 changes: 6 additions & 4 deletions reference/src/layout/function-pointers.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@ bool for_all(struct Cons const *self, bool (*func)(int, void *), void *thunk);
```

```rust
# use std::{
# ffi::c_void,
# os::raw::c_int,
# };
#
pub struct Cons {
data: c_int,
next: Option<Box<Cons>>,
Expand Down Expand Up @@ -117,9 +122,6 @@ pub extern "C" fn for_all(
}
it = node.next.as_ref().map(|x| &**x);
}
true
}
```

### Unresolved Questions

- dunno
61 changes: 0 additions & 61 deletions reference/src/layout/integers-floatingpoint.md

This file was deleted.

4 changes: 2 additions & 2 deletions reference/src/layout/pointers.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ multi-trait objects `&(dyn T + U)` or references to other dynamically sized type
other than that they are at least word-aligned, and have size at least one word.

The layout of `&dyn T` when `T` is a trait is the same as that of:
```rust
```rust,ignore
#[repr(C)]
struct DynObject {
data: *u8,
Expand All @@ -45,7 +45,7 @@ struct DynObject {
```

The layout of `&[T]` is the same as that of:
```rust
```rust,ignore
#[repr(C)]
struct Slice<T> {
ptr: *T,
Expand Down
116 changes: 116 additions & 0 deletions reference/src/layout/scalars.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Layout of scalar types

This chapter represents the consensus from issue [#9]. It documents the memory
layout and considerations for `bool`, `char`, floating point types (`f{32, 64}`), and integral types (`{i,u}{8,16,32,64,128,size}`).

These types are all scalar types, representing a single value, and have no
layout `#[repr()]` flags.

[#9]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9

## `bool`

Rust's `bool` has the same layout as C17's` _Bool`, that is, its size and
alignment are implementation-defined. Any `bool` can be cast into an integer,
taking on the values 1 (`true`) or 0 (`false`).

> **Note**: on all platforms that Rust's currently supports, its size and
> alignment are 1, and its ABI class is `INTEGER` - see [Rust Layout and ABIs].

[Rust Layout and ABIs]: https://gankro.github.io/blah/rust-layouts-and-abis/#the-layoutsabis-of-builtins

## `char`

Rust char is 32-bit wide and represents an [unicode scalar value]. The alignment
of `char` is _implementation-defined_.

[unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value

> **Note**: Rust `char` type is not layout compatible with C / C++ `char` types.
> The C / C++ `char` types correspond to either Rust's `i8` or `u8` types on all
> currently supported platforms, depending on their signedness. Rust does not
> support C platforms in which C `char` is not 8-bit wide.

## `isize` and `usize`

The `isize` and `usize` types are pointer-sized signed and unsigned integers.
They have the same layout as the [pointer types] for which the pointee is
`Sized`, and are layout compatible with C's `uintptr_t` and `intptr_t` types.

> **Note**: Rust's `usize` and C's `unsigned` types are **not** equivalent. C's
> `unsigned` is at least as large as a short, allowed to have padding bits, etc.
> but it is not necessarily pointer-sized.

The layout of `usize` determine the following:

- the maximum size of Rust values is _implementation-defined_, but can at most
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that we use "value" to mean a very particular thing here -- the same thing that C calls "object", from what I can tell. Also see #40.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A relatively easy way to avoid that issue while also being less jargon-intense would be to talk about the sizes of allocations. Just risks people thinking about heap allocation to the exclusion of all other kinds.

Copy link
Contributor Author

@gnzlbg gnzlbg Mar 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RalfJung once #40 is merged we should probably start linking these terms to their definitions (EDIT: i've added emphasis to the "values", but that doesn't help much).

@rkruppe with #40 we might be talking about the size of "places", which would remove the ambiguity with heap allocations, but it is still jargon heavy.

be `usize::max_value` since `mem::size_of` and `mem::size_of_val` return
`usize`,
- the maximum number of elements in an array is _implementation-defined_, but
can at most be `usize::max_value()` since `[T; N: usize]`,
- the maximum value by which a pointer can be offseted is
_implementation-defined_, but can at most be `usize::max_value()` since
`ptr.add(count: usize)`.

> **Note**: in the current Rust implementation:
>
> * the maximum size of Rust values is limited to `isize::max_value()`. The LLVM
> `getelementptr` instruction uses signed-integer field offsets. Rust calls
> `getelementptr` with the `inbounds` flag which assumes that field offsets do
> not overflow,
> * the maximum number of elements in an array is `usize::max_value()`,
> * the maximum value by which a pointer can be offseted is `usize::max_value()`.

[pointer types]: ./pointers.md

## Fixed-width integer types

Rust's signed and unsigned fixed-width integer types `{i,u}{8,16,32,64}` have
the same layout as the C fixed-width integer types from the `<stdint.h>` header
`{u,}int{8,16,32,64}_t`. That is:

* these types have no padding bits,
* their size exactly matches their bit-width,
* negative values of signed integer types are represented using 2's complement.

This properties also hold for Rust's 128-bit wide `{i,u}128` integer types, but
C does not expose equivalent types in `<stdint.h>`.

Rust fixed-width integer types are therefore safe to use directly in C FFI where
the corresponding C fixed-width integer types are expected.

### Layout compatibility with C native integer types

The specification of native C integer types, `char`, `short`, `int`, `long`,
... as well as their `unsigned` variants, guarantees a lower bound on their size,
e.g., `short` is _at least_ 16-bit wide and _at least_ as wide as `char`.

Their exact sizes are _implementation-defined_.

Libraries like `libc` use knowledge of this _implementation-defined_ behavior on
each platform to select a layout-compatible Rust fixed-width integer type when
interfacing with native C integer types (e.g. `libc::c_int`).

> **Note**: Rust does not support C platforms on which the C native integer type
> are not compatible with any of Rust's fixed-width integer type (e.g. because
> of padding-bits, lack of 2's complement, etc.).

## Fixed-width floating point types

Rust's `f32` and `f64` single (32-bit) and double (64-bit) precision
floating-point types have [IEEE-754] `binary32` and `binary64` floating-point
layouts, respectively.

When the platforms' `"math.h"` header defines the `__STDC_IEC_559__` macro,
Rust's floating-point types are safe to use directly in C FFI where the
appropriate C types are expected (`f32` for `float`, `f64` for `double`).

If the C platform's `"math.h"` header does not define the `__STDC_IEC_559__`
macro, whether using `f32` and `f64` in C FFI is safe or not for which C type is
_implementation-defined_.

> **Note**: the `libc` crate uses knowledge of each platform's
> _implementation-defined_ behavior to provide portable `libc::c_float` and
> `libc::c_double` types that can be used to safely interface with C via FFI.

[IEEE-754]: https://en.wikipedia.org/wiki/IEEE_754
1 change: 0 additions & 1 deletion reference/src/optimizations.md

This file was deleted.