Skip to content

Commit 10b168e

Browse files
authored
Add back the canonicalization chapter. (rust-lang#1532)
* Add back the `canonicalization` chapter. * Add a `FIXME` about reorganizing contents.
1 parent e3bda8f commit 10b168e

File tree

2 files changed

+261
-0
lines changed

2 files changed

+261
-0
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@
123123
- [Lowering to logic](./traits/lowering-to-logic.md)
124124
- [Goals and clauses](./traits/goals-and-clauses.md)
125125
- [Canonical queries](./traits/canonical-queries.md)
126+
- [Canonicalization](./traits/canonicalization.md)
126127
- [Next-gen trait solving](./solve/trait-solving.md)
127128
- [Invariants of the type system](./solve/invariants.md)
128129
- [The solver](./solve/the-solver.md)

src/traits/canonicalization.md

+260
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# Canonicalization
2+
3+
> **NOTE**: FIXME: The content of this chapter has some overlap with
4+
> [Next-gen trait solving Canonicalization chapter](../solve/canonicalization.html).
5+
> It is suggested to reorganize these contents in the future.
6+
7+
Canonicalization is the process of **isolating** an inference value
8+
from its context. It is a key part of implementing
9+
[canonical queries][cq], and you may wish to read the parent chapter
10+
to get more context.
11+
12+
Canonicalization is really based on a very simple concept: every
13+
[inference variable](../type-inference.html#vars) is always in one of
14+
two states: either it is **unbound**, in which case we don't know yet
15+
what type it is, or it is **bound**, in which case we do. So to
16+
isolate some data-structure T that contains types/regions from its
17+
environment, we just walk down and find the unbound variables that
18+
appear in T; those variables get replaced with "canonical variables",
19+
starting from zero and numbered in a fixed order (left to right, for
20+
the most part, but really it doesn't matter as long as it is
21+
consistent).
22+
23+
[cq]: ./canonical-queries.html
24+
25+
So, for example, if we have the type `X = (?T, ?U)`, where `?T` and
26+
`?U` are distinct, unbound inference variables, then the canonical
27+
form of `X` would be `(?0, ?1)`, where `?0` and `?1` represent these
28+
**canonical placeholders**. Note that the type `Y = (?U, ?T)` also
29+
canonicalizes to `(?0, ?1)`. But the type `Z = (?T, ?T)` would
30+
canonicalize to `(?0, ?0)` (as would `(?U, ?U)`). In other words, the
31+
exact identity of the inference variables is not important – unless
32+
they are repeated.
33+
34+
We use this to improve caching as well as to detect cycles and other
35+
things during trait resolution. Roughly speaking, the idea is that if
36+
two trait queries have the same canonical form, then they will get
37+
the same answer. That answer will be expressed in terms of the
38+
canonical variables (`?0`, `?1`), which we can then map back to the
39+
original variables (`?T`, `?U`).
40+
41+
## Canonicalizing the query
42+
43+
To see how it works, imagine that we are asking to solve the following
44+
trait query: `?A: Foo<'static, ?B>`, where `?A` and `?B` are unbound.
45+
This query contains two unbound variables, but it also contains the
46+
lifetime `'static`. The trait system generally ignores all lifetimes
47+
and treats them equally, so when canonicalizing, we will *also*
48+
replace any [free lifetime](../appendix/background.html#free-vs-bound) with a
49+
canonical variable (Note that `'static` is actually a _free_ lifetime
50+
variable here. We are not considering it in the typing context of the whole
51+
program but only in the context of this trait reference. Mathematically, we
52+
are not quantifying over the whole program, but only this obligation).
53+
Therefore, we get the following result:
54+
55+
```text
56+
?0: Foo<'?1, ?2>
57+
```
58+
59+
Sometimes we write this differently, like so:
60+
61+
```text
62+
for<T,L,T> { ?0: Foo<'?1, ?2> }
63+
```
64+
65+
This `for<>` gives some information about each of the canonical
66+
variables within. In this case, each `T` indicates a type variable,
67+
so `?0` and `?2` are types; the `L` indicates a lifetime variable, so
68+
`?1` is a lifetime. The `canonicalize` method *also* gives back a
69+
`CanonicalVarValues` array OV with the "original values" for each
70+
canonicalized variable:
71+
72+
```text
73+
[?A, 'static, ?B]
74+
```
75+
76+
We'll need this vector OV later, when we process the query response.
77+
78+
## Executing the query
79+
80+
Once we've constructed the canonical query, we can try to solve it.
81+
To do so, we will wind up creating a fresh inference context and
82+
**instantiating** the canonical query in that context. The idea is that
83+
we create a substitution S from the canonical form containing a fresh
84+
inference variable (of suitable kind) for each canonical variable.
85+
So, for our example query:
86+
87+
```text
88+
for<T,L,T> { ?0: Foo<'?1, ?2> }
89+
```
90+
91+
the substitution S might be:
92+
93+
```text
94+
S = [?A, '?B, ?C]
95+
```
96+
97+
We can then replace the bound canonical variables (`?0`, etc) with
98+
these inference variables, yielding the following fully instantiated
99+
query:
100+
101+
```text
102+
?A: Foo<'?B, ?C>
103+
```
104+
105+
Remember that substitution S though! We're going to need it later.
106+
107+
OK, now that we have a fresh inference context and an instantiated
108+
query, we can go ahead and try to solve it. The trait solver itself is
109+
explained in more detail in [another section](./slg.html), but
110+
suffice to say that it will compute a [certainty value][cqqr] (`Proven` or
111+
`Ambiguous`) and have side-effects on the inference variables we've
112+
created. For example, if there were only one impl of `Foo`, like so:
113+
114+
[cqqr]: ./canonical-queries.html#query-response
115+
116+
```rust,ignore
117+
impl<'a, X> Foo<'a, X> for Vec<X>
118+
where X: 'a
119+
{ ... }
120+
```
121+
122+
then we might wind up with a certainty value of `Proven`, as well as
123+
creating fresh inference variables `'?D` and `?E` (to represent the
124+
parameters on the impl) and unifying as follows:
125+
126+
- `'?B = '?D`
127+
- `?A = Vec<?E>`
128+
- `?C = ?E`
129+
130+
We would also accumulate the region constraint `?E: '?D`, due to the
131+
where clause.
132+
133+
In order to create our final query result, we have to "lift" these
134+
values out of the query's inference context and into something that
135+
can be reapplied in our original inference context. We do that by
136+
**re-applying canonicalization**, but to the **query result**.
137+
138+
## Canonicalizing the query result
139+
140+
As discussed in [the parent section][cqqr], most trait queries wind up
141+
with a result that brings together a "certainty value" `certainty`, a
142+
result substitution `var_values`, and some region constraints. To
143+
create this, we wind up re-using the substitution S that we created
144+
when first instantiating our query. To refresh your memory, we had a query
145+
146+
```text
147+
for<T,L,T> { ?0: Foo<'?1, ?2> }
148+
```
149+
150+
for which we made a substutition S:
151+
152+
```text
153+
S = [?A, '?B, ?C]
154+
```
155+
156+
We then did some work which unified some of those variables with other things.
157+
If we "refresh" S with the latest results, we get:
158+
159+
```text
160+
S = [Vec<?E>, '?D, ?E]
161+
```
162+
163+
These are precisely the new values for the three input variables from
164+
our original query. Note though that they include some new variables
165+
(like `?E`). We can make those go away by canonicalizing again! We don't
166+
just canonicalize S, though, we canonicalize the whole query response QR:
167+
168+
```text
169+
QR = {
170+
certainty: Proven, // or whatever
171+
var_values: [Vec<?E>, '?D, ?E] // this is S
172+
region_constraints: [?E: '?D], // from the impl
173+
value: (), // for our purposes, just (), but
174+
// in some cases this might have
175+
// a type or other info
176+
}
177+
```
178+
179+
The result would be as follows:
180+
181+
```text
182+
Canonical(QR) = for<T, L> {
183+
certainty: Proven,
184+
var_values: [Vec<?0>, '?1, ?0]
185+
region_constraints: [?0: '?1],
186+
value: (),
187+
}
188+
```
189+
190+
(One subtle point: when we canonicalize the query **result**, we do not
191+
use any special treatment for free lifetimes. Note that both
192+
references to `'?D`, for example, were converted into the same
193+
canonical variable (`?1`). This is in contrast to the original query,
194+
where we canonicalized every free lifetime into a fresh canonical
195+
variable.)
196+
197+
Now, this result must be reapplied in each context where needed.
198+
199+
## Processing the canonicalized query result
200+
201+
In the previous section we produced a canonical query result. We now have
202+
to apply that result in our original context. If you recall, way back in the
203+
beginning, we were trying to prove this query:
204+
205+
```text
206+
?A: Foo<'static, ?B>
207+
```
208+
209+
We canonicalized that into this:
210+
211+
```text
212+
for<T,L,T> { ?0: Foo<'?1, ?2> }
213+
```
214+
215+
and now we got back a canonical response:
216+
217+
```text
218+
for<T, L> {
219+
certainty: Proven,
220+
var_values: [Vec<?0>, '?1, ?0]
221+
region_constraints: [?0: '?1],
222+
value: (),
223+
}
224+
```
225+
226+
We now want to apply that response to our context. Conceptually, how
227+
we do that is to (a) instantiate each of the canonical variables in
228+
the result with a fresh inference variable, (b) unify the values in
229+
the result with the original values, and then (c) record the region
230+
constraints for later. Doing step (a) would yield a result of
231+
232+
```text
233+
{
234+
certainty: Proven,
235+
var_values: [Vec<?C>, '?D, ?C]
236+
^^ ^^^ fresh inference variables
237+
region_constraints: [?C: '?D],
238+
value: (),
239+
}
240+
```
241+
242+
Step (b) would then unify:
243+
244+
```text
245+
?A with Vec<?C>
246+
'static with '?D
247+
?B with ?C
248+
```
249+
250+
And finally the region constraint of `?C: 'static` would be recorded
251+
for later verification.
252+
253+
(What we *actually* do is a mildly optimized variant of that: Rather
254+
than eagerly instantiating all of the canonical values in the result
255+
with variables, we instead walk the vector of values, looking for
256+
cases where the value is just a canonical variable. In our example,
257+
`values[2]` is `?C`, so that means we can deduce that `?C := ?B` and
258+
`'?D := 'static`. This gives us a partial set of values. Anything for
259+
which we do not find a value, we create an inference variable.)
260+

0 commit comments

Comments
 (0)