|
1 | 1 | /*!
|
2 |
| - * # Borrow check |
3 |
| - * |
4 |
| - * This pass is in job of enforcing *memory safety* and *purity*. As |
5 |
| - * memory safety is by far the more complex topic, I'll focus on that in |
6 |
| - * this description, but purity will be covered later on. In the context |
7 |
| - * of Rust, memory safety means three basic things: |
8 |
| - * |
9 |
| - * - no writes to immutable memory; |
10 |
| - * - all pointers point to non-freed memory; |
11 |
| - * - all pointers point to memory of the same type as the pointer. |
12 |
| - * |
13 |
| - * The last point might seem confusing: after all, for the most part, |
14 |
| - * this condition is guaranteed by the type check. However, there are |
15 |
| - * two cases where the type check effectively delegates to borrow check. |
16 |
| - * |
17 |
| - * The first case has to do with enums. If there is a pointer to the |
18 |
| - * interior of an enum, and the enum is in a mutable location (such as a |
19 |
| - * local variable or field declared to be mutable), it is possible that |
20 |
| - * the user will overwrite the enum with a new value of a different |
21 |
| - * variant, and thus effectively change the type of the memory that the |
22 |
| - * pointer is pointing at. |
23 |
| - * |
24 |
| - * The second case has to do with mutability. Basically, the type |
25 |
| - * checker has only a limited understanding of mutability. It will allow |
26 |
| - * (for example) the user to get an immutable pointer with the address of |
27 |
| - * a mutable local variable. It will also allow a `@mut T` or `~mut T` |
28 |
| - * pointer to be borrowed as a `&r.T` pointer. These seeming oversights |
29 |
| - * are in fact intentional; they allow the user to temporarily treat a |
30 |
| - * mutable value as immutable. It is up to the borrow check to guarantee |
31 |
| - * that the value in question is not in fact mutated during the lifetime |
32 |
| - * `r` of the reference. |
33 |
| - * |
34 |
| - * # Summary of the safety check |
35 |
| - * |
36 |
| - * In order to enforce mutability, the borrow check has three tricks up |
37 |
| - * its sleeve. |
38 |
| - * |
39 |
| - * First, data which is uniquely tied to the current stack frame (that'll |
40 |
| - * be defined shortly) is tracked very precisely. This means that, for |
41 |
| - * example, if an immutable pointer to a mutable local variable is |
42 |
| - * created, the borrowck will simply check for assignments to that |
43 |
| - * particular local variable: no other memory is affected. |
44 |
| - * |
45 |
| - * Second, if the data is not uniquely tied to the stack frame, it may |
46 |
| - * still be possible to ensure its validity by rooting garbage collected |
47 |
| - * pointers at runtime. For example, if there is a mutable local |
48 |
| - * variable `x` of type `@T`, and its contents are borrowed with an |
49 |
| - * expression like `&*x`, then the value of `x` will be rooted (today, |
50 |
| - * that means its ref count will be temporary increased) for the lifetime |
51 |
| - * of the reference that is created. This means that the pointer remains |
52 |
| - * valid even if `x` is reassigned. |
53 |
| - * |
54 |
| - * Finally, if neither of these two solutions are applicable, then we |
55 |
| - * require that all operations within the scope of the reference be |
56 |
| - * *pure*. A pure operation is effectively one that does not write to |
57 |
| - * any aliasable memory. This means that it is still possible to write |
58 |
| - * to local variables or other data that is uniquely tied to the stack |
59 |
| - * frame (there's that term again; formal definition still pending) but |
60 |
| - * not to data reached via a `&T` or `@T` pointer. Such writes could |
61 |
| - * possibly have the side-effect of causing the data which must remain |
62 |
| - * valid to be overwritten. |
63 |
| - * |
64 |
| - * # Possible future directions |
65 |
| - * |
66 |
| - * There are numerous ways that the `borrowck` could be strengthened, but |
67 |
| - * these are the two most likely: |
68 |
| - * |
69 |
| - * - flow-sensitivity: we do not currently consider flow at all but only |
70 |
| - * block-scoping. This means that innocent code like the following is |
71 |
| - * rejected: |
72 |
| - * |
73 |
| - * let mut x: int; |
74 |
| - * ... |
75 |
| - * x = 5; |
76 |
| - * let y: &int = &x; // immutable ptr created |
77 |
| - * ... |
78 |
| - * |
79 |
| - * The reason is that the scope of the pointer `y` is the entire |
80 |
| - * enclosing block, and the assignment `x = 5` occurs within that |
81 |
| - * block. The analysis is not smart enough to see that `x = 5` always |
82 |
| - * happens before the immutable pointer is created. This is relatively |
83 |
| - * easy to fix and will surely be fixed at some point. |
84 |
| - * |
85 |
| - * - finer-grained purity checks: currently, our fallback for |
86 |
| - * guaranteeing random references into mutable, aliasable memory is to |
87 |
| - * require *total purity*. This is rather strong. We could use local |
88 |
| - * type-based alias analysis to distinguish writes that could not |
89 |
| - * possibly invalid the references which must be guaranteed. This |
90 |
| - * would only work within the function boundaries; function calls would |
91 |
| - * still require total purity. This seems less likely to be |
92 |
| - * implemented in the short term as it would make the code |
93 |
| - * significantly more complex; there is currently no code to analyze |
94 |
| - * the types and determine the possible impacts of a write. |
95 |
| - * |
96 |
| - * # Terminology |
97 |
| - * |
98 |
| - * A **loan** is . |
99 |
| - * |
100 |
| - * # How the code works |
101 |
| - * |
102 |
| - * The borrow check code is divided into several major modules, each of |
103 |
| - * which is documented in its own file. |
104 |
| - * |
105 |
| - * The `gather_loans` and `check_loans` are the two major passes of the |
106 |
| - * analysis. The `gather_loans` pass runs over the IR once to determine |
107 |
| - * what memory must remain valid and for how long. Its name is a bit of |
108 |
| - * a misnomer; it does in fact gather up the set of loans which are |
109 |
| - * granted, but it also determines when @T pointers must be rooted and |
110 |
| - * for which scopes purity must be required. |
111 |
| - * |
112 |
| - * The `check_loans` pass walks the IR and examines the loans and purity |
113 |
| - * requirements computed in `gather_loans`. It checks to ensure that (a) |
114 |
| - * the conditions of all loans are honored; (b) no contradictory loans |
115 |
| - * were granted (for example, loaning out the same memory as mutable and |
116 |
| - * immutable simultaneously); and (c) any purity requirements are |
117 |
| - * honored. |
118 |
| - * |
119 |
| - * The remaining modules are helper modules used by `gather_loans` and |
120 |
| - * `check_loans`: |
121 |
| - * |
122 |
| - * - `categorization` has the job of analyzing an expression to determine |
123 |
| - * what kind of memory is used in evaluating it (for example, where |
124 |
| - * dereferences occur and what kind of pointer is dereferenced; whether |
125 |
| - * the memory is mutable; etc) |
126 |
| - * - `loan` determines when data uniquely tied to the stack frame can be |
127 |
| - * loaned out. |
128 |
| - * - `preserve` determines what actions (if any) must be taken to preserve |
129 |
| - * aliasable data. This is the code which decides when to root |
130 |
| - * an @T pointer or to require purity. |
131 |
| - * |
132 |
| - * # Maps that are created |
133 |
| - * |
134 |
| - * Borrowck results in two maps. |
135 |
| - * |
136 |
| - * - `root_map`: identifies those expressions or patterns whose result |
137 |
| - * needs to be rooted. Conceptually the root_map maps from an |
138 |
| - * expression or pattern node to a `node_id` identifying the scope for |
139 |
| - * which the expression must be rooted (this `node_id` should identify |
140 |
| - * a block or call). The actual key to the map is not an expression id, |
141 |
| - * however, but a `root_map_key`, which combines an expression id with a |
142 |
| - * deref count and is used to cope with auto-deref. |
143 |
| - * |
144 |
| - * - `mutbl_map`: identifies those local variables which are modified or |
145 |
| - * moved. This is used by trans to guarantee that such variables are |
146 |
| - * given a memory location and not used as immediates. |
| 2 | +# Borrow check |
| 3 | +
|
| 4 | +This pass is in job of enforcing *memory safety* and *purity*. As |
| 5 | +memory safety is by far the more complex topic, I'll focus on that in |
| 6 | +this description, but purity will be covered later on. In the context |
| 7 | +of Rust, memory safety means three basic things: |
| 8 | +
|
| 9 | +- no writes to immutable memory; |
| 10 | +- all pointers point to non-freed memory; |
| 11 | +- all pointers point to memory of the same type as the pointer. |
| 12 | +
|
| 13 | +The last point might seem confusing: after all, for the most part, |
| 14 | +this condition is guaranteed by the type check. However, there are |
| 15 | +two cases where the type check effectively delegates to borrow check. |
| 16 | +
|
| 17 | +The first case has to do with enums. If there is a pointer to the |
| 18 | +interior of an enum, and the enum is in a mutable location (such as a |
| 19 | +local variable or field declared to be mutable), it is possible that |
| 20 | +the user will overwrite the enum with a new value of a different |
| 21 | +variant, and thus effectively change the type of the memory that the |
| 22 | +pointer is pointing at. |
| 23 | +
|
| 24 | +The second case has to do with mutability. Basically, the type |
| 25 | +checker has only a limited understanding of mutability. It will allow |
| 26 | +(for example) the user to get an immutable pointer with the address of |
| 27 | +a mutable local variable. It will also allow a `@mut T` or `~mut T` |
| 28 | +pointer to be borrowed as a `&r.T` pointer. These seeming oversights |
| 29 | +are in fact intentional; they allow the user to temporarily treat a |
| 30 | +mutable value as immutable. It is up to the borrow check to guarantee |
| 31 | +that the value in question is not in fact mutated during the lifetime |
| 32 | +`r` of the reference. |
| 33 | +
|
| 34 | +# Definition of unstable memory |
| 35 | +
|
| 36 | +The primary danger to safety arises due to *unstable memory*. |
| 37 | +Unstable memory is memory whose validity or type may change as a |
| 38 | +result of an assignment, move, or a variable going out of scope. |
| 39 | +There are two cases in Rust where memory is unstable: the contents of |
| 40 | +unique boxes and enums. |
| 41 | +
|
| 42 | +Unique boxes are unstable because when the variable containing the |
| 43 | +unique box is re-assigned, moves, or goes out of scope, the unique box |
| 44 | +is freed or---in the case of a move---potentially given to another |
| 45 | +task. In either case, if there is an extant and usable pointer into |
| 46 | +the box, then safety guarantees would be compromised. |
| 47 | +
|
| 48 | +Enum values are unstable because they are reassigned the types of |
| 49 | +their contents may change if they are assigned with a different |
| 50 | +variant than they had previously. |
| 51 | +
|
| 52 | +# Safety criteria that must be enforced |
| 53 | +
|
| 54 | +Whenever a piece of memory is borrowed for lifetime L, there are two |
| 55 | +things which the borrow checker must guarantee. First, it must |
| 56 | +guarantee that the memory address will remain allocated (and owned by |
| 57 | +the current task) for the entirety of the lifetime L. Second, it must |
| 58 | +guarantee that the type of the data will not change for the entirety |
| 59 | +of the lifetime L. In exchange, the region-based type system will |
| 60 | +guarantee that the pointer is not used outside the lifetime L. These |
| 61 | +guarantees are to some extent independent but are also inter-related. |
| 62 | +
|
| 63 | +In some cases, the type of a pointer cannot be invalidated but the |
| 64 | +lifetime can. For example, imagine a pointer to the interior of |
| 65 | +a shared box like: |
| 66 | +
|
| 67 | + let mut x = @mut {f: 5, g: 6}; |
| 68 | + let y = &mut x.f; |
| 69 | +
|
| 70 | +Here, a pointer was created to the interior of a shared box which |
| 71 | +contains a record. Even if `*x` were to be mutated like so: |
| 72 | +
|
| 73 | + *x = {f: 6, g: 7}; |
| 74 | +
|
| 75 | +This would cause `*y` to change from 5 to 6, but the pointer pointer |
| 76 | +`y` remains valid. It still points at an integer even if that integer |
| 77 | +has been overwritten. |
| 78 | +
|
| 79 | +However, if we were to reassign `x` itself, like so: |
| 80 | +
|
| 81 | + x = @{f: 6, g: 7}; |
| 82 | +
|
| 83 | +This could potentially invalidate `y`, because if `x` were the final |
| 84 | +reference to the shared box, then that memory would be released and |
| 85 | +now `y` points at freed memory. (We will see that to prevent this |
| 86 | +scenario we will *root* shared boxes that reside in mutable memory |
| 87 | +whose contents are borrowed; rooting means that we create a temporary |
| 88 | +to ensure that the box is not collected). |
| 89 | +
|
| 90 | +In other cases, like an enum on the stack, the memory cannot be freed |
| 91 | +but its type can change: |
| 92 | +
|
| 93 | + let mut x = some(5); |
| 94 | + alt x { |
| 95 | + some(ref y) => { ... } |
| 96 | + none => { ... } |
| 97 | + } |
| 98 | +
|
| 99 | +Here as before, the pointer `y` would be invalidated if we were to |
| 100 | +reassign `x` to `none`. (We will see that this case is prevented |
| 101 | +because borrowck tracks data which resides on the stack and prevents |
| 102 | +variables from reassigned if there may be pointers to their interior) |
| 103 | +
|
| 104 | +Finally, in some cases, both dangers can arise. For example, something |
| 105 | +like the following: |
| 106 | +
|
| 107 | + let mut x = ~some(5); |
| 108 | + alt x { |
| 109 | + ~some(ref y) => { ... } |
| 110 | + ~none => { ... } |
| 111 | + } |
| 112 | +
|
| 113 | +In this case, if `x` to be reassigned or `*x` were to be mutated, then |
| 114 | +the pointer `y` would be invalided. (This case is also prevented by |
| 115 | +borrowck tracking data which is owned by the current stack frame) |
| 116 | +
|
| 117 | +# Summary of the safety check |
| 118 | +
|
| 119 | +In order to enforce mutability, the borrow check has a few tricks up |
| 120 | +its sleeve: |
| 121 | +
|
| 122 | +- When data is owned by the current stack frame, we can identify every |
| 123 | + possible assignment to a local variable and simply prevent |
| 124 | + potentially dangerous assignments directly. |
| 125 | +
|
| 126 | +- If data is owned by a shared box, we can root the box to increase |
| 127 | + its lifetime. |
| 128 | +
|
| 129 | +- If data is found within a borrowed pointer, we can assume that the |
| 130 | + data will remain live for the entirety of the borrowed pointer. |
| 131 | +
|
| 132 | +- We can rely on the fact that pure actions (such as calling pure |
| 133 | + functions) do not mutate data which is not owned by the current |
| 134 | + stack frame. |
| 135 | +
|
| 136 | +# Possible future directions |
| 137 | +
|
| 138 | +There are numerous ways that the `borrowck` could be strengthened, but |
| 139 | +these are the two most likely: |
| 140 | +
|
| 141 | +- flow-sensitivity: we do not currently consider flow at all but only |
| 142 | + block-scoping. This means that innocent code like the following is |
| 143 | + rejected: |
| 144 | +
|
| 145 | + let mut x: int; |
| 146 | + ... |
| 147 | + x = 5; |
| 148 | + let y: &int = &x; // immutable ptr created |
| 149 | + ... |
| 150 | +
|
| 151 | + The reason is that the scope of the pointer `y` is the entire |
| 152 | + enclosing block, and the assignment `x = 5` occurs within that |
| 153 | + block. The analysis is not smart enough to see that `x = 5` always |
| 154 | + happens before the immutable pointer is created. This is relatively |
| 155 | + easy to fix and will surely be fixed at some point. |
| 156 | +
|
| 157 | +- finer-grained purity checks: currently, our fallback for |
| 158 | + guaranteeing random references into mutable, aliasable memory is to |
| 159 | + require *total purity*. This is rather strong. We could use local |
| 160 | + type-based alias analysis to distinguish writes that could not |
| 161 | + possibly invalid the references which must be guaranteed. This |
| 162 | + would only work within the function boundaries; function calls would |
| 163 | + still require total purity. This seems less likely to be |
| 164 | + implemented in the short term as it would make the code |
| 165 | + significantly more complex; there is currently no code to analyze |
| 166 | + the types and determine the possible impacts of a write. |
| 167 | +
|
| 168 | +# How the code works |
| 169 | +
|
| 170 | +The borrow check code is divided into several major modules, each of |
| 171 | +which is documented in its own file. |
| 172 | +
|
| 173 | +The `gather_loans` and `check_loans` are the two major passes of the |
| 174 | +analysis. The `gather_loans` pass runs over the IR once to determine |
| 175 | +what memory must remain valid and for how long. Its name is a bit of |
| 176 | +a misnomer; it does in fact gather up the set of loans which are |
| 177 | +granted, but it also determines when @T pointers must be rooted and |
| 178 | +for which scopes purity must be required. |
| 179 | +
|
| 180 | +The `check_loans` pass walks the IR and examines the loans and purity |
| 181 | +requirements computed in `gather_loans`. It checks to ensure that (a) |
| 182 | +the conditions of all loans are honored; (b) no contradictory loans |
| 183 | +were granted (for example, loaning out the same memory as mutable and |
| 184 | +immutable simultaneously); and (c) any purity requirements are |
| 185 | +honored. |
| 186 | +
|
| 187 | +The remaining modules are helper modules used by `gather_loans` and |
| 188 | +`check_loans`: |
| 189 | +
|
| 190 | +- `categorization` has the job of analyzing an expression to determine |
| 191 | + what kind of memory is used in evaluating it (for example, where |
| 192 | + dereferences occur and what kind of pointer is dereferenced; whether |
| 193 | + the memory is mutable; etc) |
| 194 | +- `loan` determines when data uniquely tied to the stack frame can be |
| 195 | + loaned out. |
| 196 | +- `preserve` determines what actions (if any) must be taken to preserve |
| 197 | + aliasable data. This is the code which decides when to root |
| 198 | + an @T pointer or to require purity. |
| 199 | +
|
| 200 | +# Maps that are created |
| 201 | +
|
| 202 | +Borrowck results in two maps. |
| 203 | +
|
| 204 | +- `root_map`: identifies those expressions or patterns whose result |
| 205 | + needs to be rooted. Conceptually the root_map maps from an |
| 206 | + expression or pattern node to a `node_id` identifying the scope for |
| 207 | + which the expression must be rooted (this `node_id` should identify |
| 208 | + a block or call). The actual key to the map is not an expression id, |
| 209 | + however, but a `root_map_key`, which combines an expression id with a |
| 210 | + deref count and is used to cope with auto-deref. |
| 211 | +
|
| 212 | +- `mutbl_map`: identifies those local variables which are modified or |
| 213 | + moved. This is used by trans to guarantee that such variables are |
| 214 | + given a memory location and not used as immediates. |
147 | 215 | */
|
148 | 216 |
|
149 | 217 | import syntax::ast;
|
|
0 commit comments