From e84a17940cf032214df7436182b8d8cd8964e080 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 19:41:06 +0200 Subject: [PATCH 01/15] define provenance --- reference/src/glossary.md | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 7bc3220d..82f7cd28 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -10,8 +10,7 @@ bytes. **Note**: a full aliasing model for Rust, defining when aliasing is allowed and when not, has not yet been defined. The purpose of this definition is to define when aliasing *happens*, not when it is *allowed*. The most developed -potential aliasing model so far is known as "Stacked Borrows", and can be found -[here](https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md). +potential aliasing model so far is [Stacked Borrows][stacked-borrows]. Consider the following example: @@ -56,6 +55,24 @@ somewhat differently from this definition. However, that's considered a low level detail of a particular Rust implementation. When programming Rust, the Abstract Rust Machine is intended to operate according to the definition here. +#### (Pointer) Provenance + +The *provenance* of a pointer can be used to distinguish pointers that point to the same memory location. +For example, doing pointer arithmetic "remembers" the original allocation to which the pointer pointed, so it is impossible to cross allocation boundaries using pointer arithmetic: + +```rust +let raw1 = Box::into_raw(Box::new(13u8)); +let raw2 = Box::into_raw(Box::new(42u8)); +let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize); +// Now raw2 and raw2_wrong have same *address*... +assert_eq!(raw2 as usize, raw2_wrong as usize); +// ...but it would be UB to use raw2_wrong, as it was obtained by +// cross-allocation arithmetic. raw2_wrong has the wrong *provenance*. +``` + +Another example of pointer provenance is the "tag" from [Stacked Borrows][stacked-borrows]. +For some more information, see [this blog post](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html) and [this document proposing a more precise definition of provenance for C](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf). + #### Interior mutability *Interior Mutation* means mutating memory where there also exists a live shared reference pointing to the same memory; or mutating memory through a pointer derived from a shared reference. @@ -140,7 +157,8 @@ requirement of 2. ### TODO -* *tag* * *rvalue* * *lvalue* * *representation* + +[stacked-borrows]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md From 986b70079579e05bc4c2330f37a1404f1ce0db50 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 20:18:29 +0200 Subject: [PATCH 02/15] location -> address --- reference/src/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 82f7cd28..fafeff79 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -57,7 +57,7 @@ Abstract Rust Machine is intended to operate according to the definition here. #### (Pointer) Provenance -The *provenance* of a pointer can be used to distinguish pointers that point to the same memory location. +The *provenance* of a pointer can be used to distinguish pointers that point to the same memory address. For example, doing pointer arithmetic "remembers" the original allocation to which the pointer pointed, so it is impossible to cross allocation boundaries using pointer arithmetic: ```rust From 01a218665822944fa91259d5bfb32ffff2c6293d Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 10:57:16 +0200 Subject: [PATCH 03/15] try to clarify --- reference/src/glossary.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index fafeff79..f0e7c544 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -58,7 +58,10 @@ Abstract Rust Machine is intended to operate according to the definition here. #### (Pointer) Provenance The *provenance* of a pointer can be used to distinguish pointers that point to the same memory address. -For example, doing pointer arithmetic "remembers" the original allocation to which the pointer pointed, so it is impossible to cross allocation boundaries using pointer arithmetic: + +For example, we have to distinguish pointers to the same location if they originated from different allocations. +A pointer "remembers" the original allocation to which it pointed. +This is necessary to make it impossible for pointer arithmetic to cross allocation boundaries: ```rust let raw1 = Box::into_raw(Box::new(13u8)); From 41cc668d6f1bb49e6611122b526f8cc4673d2fa6 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:03:52 +0200 Subject: [PATCH 04/15] try to clarify more --- reference/src/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index f0e7c544..24a65d77 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -57,7 +57,7 @@ Abstract Rust Machine is intended to operate according to the definition here. #### (Pointer) Provenance -The *provenance* of a pointer can be used to distinguish pointers that point to the same memory address. +The *provenance* of a pointer can be used, in the Rust Abstract Machine, to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal). For example, we have to distinguish pointers to the same location if they originated from different allocations. A pointer "remembers" the original allocation to which it pointed. From a403e9f9f0b2239a0e64872e1228d451cd61d359 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:07:11 +0200 Subject: [PATCH 05/15] expand pointer provenance example --- reference/src/glossary.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 24a65d77..48ffe8d3 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -57,17 +57,24 @@ Abstract Rust Machine is intended to operate according to the definition here. #### (Pointer) Provenance -The *provenance* of a pointer can be used, in the Rust Abstract Machine, to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal). +The *provenance* of a pointer is used, in the Rust Abstract Machine, to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal). For example, we have to distinguish pointers to the same location if they originated from different allocations. -A pointer "remembers" the original allocation to which it pointed. -This is necessary to make it impossible for pointer arithmetic to cross allocation boundaries: +After all, cross-allocation pointer arithmetic does not lead to usable pointers, so the Rust Abstract Machine *somehow* has to remember the original allocation to which a pointer pointed. +It uses provenance to achieve this: ```rust +// Let's assume the two allocations here have base addresses 0x100 and 0x200. +// We write pointer provenance as `@N` where `N` is some kind of ID uniquely +// identifying the allocation. let raw1 = Box::into_raw(Box::new(13u8)); let raw2 = Box::into_raw(Box::new(42u8)); let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize); -// Now raw2 and raw2_wrong have same *address*... +// These pointers now have the following values: +// raw1 points to address 0x100 and has provenance @1. +// raw2 points to address 0x200 and has provenance @2. +// raw2_wrong points to address 0x200 and has provenance @1. +// In other words, raw2 and raw2_wrong have same *address*... assert_eq!(raw2 as usize, raw2_wrong as usize); // ...but it would be UB to use raw2_wrong, as it was obtained by // cross-allocation arithmetic. raw2_wrong has the wrong *provenance*. From 255187ecbdbb4bc1bd9d1f70d759a54854bb6ea6 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:08:58 +0200 Subject: [PATCH 06/15] link to wrapping_offset docs --- reference/src/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 48ffe8d3..6d4bf736 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -60,7 +60,7 @@ Abstract Rust Machine is intended to operate according to the definition here. The *provenance* of a pointer is used, in the Rust Abstract Machine, to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal). For example, we have to distinguish pointers to the same location if they originated from different allocations. -After all, cross-allocation pointer arithmetic does not lead to usable pointers, so the Rust Abstract Machine *somehow* has to remember the original allocation to which a pointer pointed. +Cross-allocation pointer arithmetic [does not lead to usable pointers](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset), so the Rust Abstract Machine *somehow* has to remember the original allocation to which a pointer pointed. It uses provenance to achieve this: ```rust From 2c9b3c0d448c53c0d951c3964b82db39e8ceb86b Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:12:34 +0200 Subject: [PATCH 07/15] provenance is ghost state --- reference/src/glossary.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 6d4bf736..e9bdb831 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -57,7 +57,8 @@ Abstract Rust Machine is intended to operate according to the definition here. #### (Pointer) Provenance -The *provenance* of a pointer is used, in the Rust Abstract Machine, to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal). +The *provenance* of a pointer is used to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal). +Provenance is extra state that only exists in the Rust Abstract Machine; it is needed to specify program behavior but not present any more when the program runs on real hardware. For example, we have to distinguish pointers to the same location if they originated from different allocations. Cross-allocation pointer arithmetic [does not lead to usable pointers](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset), so the Rust Abstract Machine *somehow* has to remember the original allocation to which a pointer pointed. From 964f2e12f6161ab352144cc25a8dd060212f1ac4 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:18:37 +0200 Subject: [PATCH 08/15] explain what's wrong about the provenance --- reference/src/glossary.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index e9bdb831..da8b6105 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -78,7 +78,9 @@ let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize); // In other words, raw2 and raw2_wrong have same *address*... assert_eq!(raw2 as usize, raw2_wrong as usize); // ...but it would be UB to use raw2_wrong, as it was obtained by -// cross-allocation arithmetic. raw2_wrong has the wrong *provenance*. +// cross-allocation arithmetic. raw2_wrong has the wrong *provenance*: +// it points to address 0x200 in allocation @2, but the pointer +// has provenance @1. ``` Another example of pointer provenance is the "tag" from [Stacked Borrows][stacked-borrows]. From 80edfa88118356e92add84a521e0fdd1bf3d947b Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:21:21 +0200 Subject: [PATCH 09/15] clarify why this is UB --- reference/src/glossary.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index da8b6105..74612eaf 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -77,8 +77,7 @@ let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize); // raw2_wrong points to address 0x200 and has provenance @1. // In other words, raw2 and raw2_wrong have same *address*... assert_eq!(raw2 as usize, raw2_wrong as usize); -// ...but it would be UB to use raw2_wrong, as it was obtained by -// cross-allocation arithmetic. raw2_wrong has the wrong *provenance*: +// ...but it would be UB to use raw2_wrong, as it has the wrong *provenance*: // it points to address 0x200 in allocation @2, but the pointer // has provenance @1. ``` From 9faf0426686ba94d0bfac6071cc3e648250da20a Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:23:14 +0200 Subject: [PATCH 10/15] provenance really is just ghost state --- reference/src/glossary.md | 1 + 1 file changed, 1 insertion(+) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 74612eaf..1f75c59a 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -59,6 +59,7 @@ Abstract Rust Machine is intended to operate according to the definition here. The *provenance* of a pointer is used to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal). Provenance is extra state that only exists in the Rust Abstract Machine; it is needed to specify program behavior but not present any more when the program runs on real hardware. +In other words, pointers that only differ in their provenance can *not* be distinguished any more in the final binary (but provenance can influence how the compiler translates the program). For example, we have to distinguish pointers to the same location if they originated from different allocations. Cross-allocation pointer arithmetic [does not lead to usable pointers](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset), so the Rust Abstract Machine *somehow* has to remember the original allocation to which a pointer pointed. From 520c559b8063de84bcec1ad7f53e11ddf8c92c50 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:29:54 +0200 Subject: [PATCH 11/15] wording --- reference/src/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 1f75c59a..2a5f5985 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -79,7 +79,7 @@ let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize); // In other words, raw2 and raw2_wrong have same *address*... assert_eq!(raw2 as usize, raw2_wrong as usize); // ...but it would be UB to use raw2_wrong, as it has the wrong *provenance*: -// it points to address 0x200 in allocation @2, but the pointer +// it points to address 0x200, which is in allocation @2, but the pointer // has provenance @1. ``` From 5120b2a3f31794e94c111fdcfb558de2d969142f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:47:07 +0200 Subject: [PATCH 12/15] compare with C/C++ --- reference/src/glossary.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 2a5f5985..daf60bbe 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -83,6 +83,8 @@ assert_eq!(raw2 as usize, raw2_wrong as usize); // has provenance @1. ``` +This provenance also exists in C/C++, but Rust is more permissive by (a) providing a [way to do pointer arithmetic across allocation boundaries without causing immediate UB](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset) (though, as we have seen, the resulting pointer still cannot be used for locations outside the allocation it originates), and (b) by allowing pointers to always be compared safely, even if their provenance differs. + Another example of pointer provenance is the "tag" from [Stacked Borrows][stacked-borrows]. For some more information, see [this blog post](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html) and [this document proposing a more precise definition of provenance for C](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf). From b1315e8efeab12110f58babc8918d10354a2fcc0 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:52:26 +0200 Subject: [PATCH 13/15] clarify what is UB --- reference/src/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index daf60bbe..2945a4ec 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -78,7 +78,7 @@ let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize); // raw2_wrong points to address 0x200 and has provenance @1. // In other words, raw2 and raw2_wrong have same *address*... assert_eq!(raw2 as usize, raw2_wrong as usize); -// ...but it would be UB to use raw2_wrong, as it has the wrong *provenance*: +// ...but it would be UB to dereference raw2_wrong, as it has the wrong *provenance*: // it points to address 0x200, which is in allocation @2, but the pointer // has provenance @1. ``` From 24022e3a77c617417b33ddf3736b71a37eaca7f1 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:53:10 +0200 Subject: [PATCH 14/15] weasle words --- reference/src/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 2945a4ec..6afaa835 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -83,7 +83,7 @@ assert_eq!(raw2 as usize, raw2_wrong as usize); // has provenance @1. ``` -This provenance also exists in C/C++, but Rust is more permissive by (a) providing a [way to do pointer arithmetic across allocation boundaries without causing immediate UB](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset) (though, as we have seen, the resulting pointer still cannot be used for locations outside the allocation it originates), and (b) by allowing pointers to always be compared safely, even if their provenance differs. +This kind of provenance also exists in C/C++, but Rust is more permissive by (a) providing a [way to do pointer arithmetic across allocation boundaries without causing immediate UB](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset) (though, as we have seen, the resulting pointer still cannot be used for locations outside the allocation it originates), and (b) by allowing pointers to always be compared safely, even if their provenance differs. Another example of pointer provenance is the "tag" from [Stacked Borrows][stacked-borrows]. For some more information, see [this blog post](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html) and [this document proposing a more precise definition of provenance for C](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf). From 3d504f1a3c0656962d43cee68e355f909afce20e Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 28 Jul 2019 10:42:26 +0200 Subject: [PATCH 15/15] be more clear that the provenance example is just that --- reference/src/glossary.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 6afaa835..ef766057 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -61,6 +61,11 @@ The *provenance* of a pointer is used to distinguish pointers that point to the Provenance is extra state that only exists in the Rust Abstract Machine; it is needed to specify program behavior but not present any more when the program runs on real hardware. In other words, pointers that only differ in their provenance can *not* be distinguished any more in the final binary (but provenance can influence how the compiler translates the program). +The exact form of provenance in Rust is unclear. +It is also unclear whether provenance applies to more than just pointers, i.e., one could imagine integers having provenance as well (so that pointer provenance can be preserved when pointers are cast to an integer and back). +In the following, we give some examples if what provenance *could* look like. + +**Using provenance to track originating allocation.** For example, we have to distinguish pointers to the same location if they originated from different allocations. Cross-allocation pointer arithmetic [does not lead to usable pointers](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset), so the Rust Abstract Machine *somehow* has to remember the original allocation to which a pointer pointed. It uses provenance to achieve this: @@ -84,9 +89,11 @@ assert_eq!(raw2 as usize, raw2_wrong as usize); ``` This kind of provenance also exists in C/C++, but Rust is more permissive by (a) providing a [way to do pointer arithmetic across allocation boundaries without causing immediate UB](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset) (though, as we have seen, the resulting pointer still cannot be used for locations outside the allocation it originates), and (b) by allowing pointers to always be compared safely, even if their provenance differs. +For some more information, see [this document proposing a more precise definition of provenance for C](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf). +**Using provenance for Rust's aliasing rules.** Another example of pointer provenance is the "tag" from [Stacked Borrows][stacked-borrows]. -For some more information, see [this blog post](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html) and [this document proposing a more precise definition of provenance for C](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf). +For some more information, see [this blog post](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html). #### Interior mutability