Skip to content

Commit b6d91a2

Browse files
committed
Auto merge of #22126 - steveklabnik:gh21281, r=nikomatsakis
This is super black magic internals at the moment, but having it somewhere semi-public seems good. The current versions weren't being rendered, and they'll be useful for some people. Fixes #21281 r? @nikomatsakis @kmcallister
2 parents 5be210c + bdc730e commit b6d91a2

File tree

21 files changed

+2545
-2639
lines changed

21 files changed

+2545
-2639
lines changed

src/librustc/middle/infer/README.md

+237
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# Type inference engine
2+
3+
This is loosely based on standard HM-type inference, but with an
4+
extension to try and accommodate subtyping. There is nothing
5+
principled about this extension; it's sound---I hope!---but it's a
6+
heuristic, ultimately, and does not guarantee that it finds a valid
7+
typing even if one exists (in fact, there are known scenarios where it
8+
fails, some of which may eventually become problematic).
9+
10+
## Key idea
11+
12+
The main change is that each type variable T is associated with a
13+
lower-bound L and an upper-bound U. L and U begin as bottom and top,
14+
respectively, but gradually narrow in response to new constraints
15+
being introduced. When a variable is finally resolved to a concrete
16+
type, it can (theoretically) select any type that is a supertype of L
17+
and a subtype of U.
18+
19+
There are several critical invariants which we maintain:
20+
21+
- the upper-bound of a variable only becomes lower and the lower-bound
22+
only becomes higher over time;
23+
- the lower-bound L is always a subtype of the upper bound U;
24+
- the lower-bound L and upper-bound U never refer to other type variables,
25+
but only to types (though those types may contain type variables).
26+
27+
> An aside: if the terms upper- and lower-bound confuse you, think of
28+
> "supertype" and "subtype". The upper-bound is a "supertype"
29+
> (super=upper in Latin, or something like that anyway) and the lower-bound
30+
> is a "subtype" (sub=lower in Latin). I find it helps to visualize
31+
> a simple class hierarchy, like Java minus interfaces and
32+
> primitive types. The class Object is at the root (top) and other
33+
> types lie in between. The bottom type is then the Null type.
34+
> So the tree looks like:
35+
>
36+
> ```text
37+
> Object
38+
> / \
39+
> String Other
40+
> \ /
41+
> (null)
42+
> ```
43+
>
44+
> So the upper bound type is the "supertype" and the lower bound is the
45+
> "subtype" (also, super and sub mean upper and lower in Latin, or something
46+
> like that anyway).
47+
48+
## Satisfying constraints
49+
50+
At a primitive level, there is only one form of constraint that the
51+
inference understands: a subtype relation. So the outside world can
52+
say "make type A a subtype of type B". If there are variables
53+
involved, the inferencer will adjust their upper- and lower-bounds as
54+
needed to ensure that this relation is satisfied. (We also allow "make
55+
type A equal to type B", but this is translated into "A <: B" and "B
56+
<: A")
57+
58+
As stated above, we always maintain the invariant that type bounds
59+
never refer to other variables. This keeps the inference relatively
60+
simple, avoiding the scenario of having a kind of graph where we have
61+
to pump constraints along and reach a fixed point, but it does impose
62+
some heuristics in the case where the user is relating two type
63+
variables A <: B.
64+
65+
Combining two variables such that variable A will forever be a subtype
66+
of variable B is the trickiest part of the algorithm because there is
67+
often no right choice---that is, the right choice will depend on
68+
future constraints which we do not yet know. The problem comes about
69+
because both A and B have bounds that can be adjusted in the future.
70+
Let's look at some of the cases that can come up.
71+
72+
Imagine, to start, the best case, where both A and B have an upper and
73+
lower bound (that is, the bounds are not top nor bot respectively). In
74+
that case, if we're lucky, A.ub <: B.lb, and so we know that whatever
75+
A and B should become, they will forever have the desired subtyping
76+
relation. We can just leave things as they are.
77+
78+
### Option 1: Unify
79+
80+
However, suppose that A.ub is *not* a subtype of B.lb. In
81+
that case, we must make a decision. One option is to unify A
82+
and B so that they are one variable whose bounds are:
83+
84+
UB = GLB(A.ub, B.ub)
85+
LB = LUB(A.lb, B.lb)
86+
87+
(Note that we will have to verify that LB <: UB; if it does not, the
88+
types are not intersecting and there is an error) In that case, A <: B
89+
holds trivially because A==B. However, we have now lost some
90+
flexibility, because perhaps the user intended for A and B to end up
91+
as different types and not the same type.
92+
93+
Pictorally, what this does is to take two distinct variables with
94+
(hopefully not completely) distinct type ranges and produce one with
95+
the intersection.
96+
97+
```text
98+
B.ub B.ub
99+
/\ /
100+
A.ub / \ A.ub /
101+
/ \ / \ \ /
102+
/ X \ UB
103+
/ / \ \ / \
104+
/ / / \ / /
105+
\ \ / / \ /
106+
\ X / LB
107+
\ / \ / / \
108+
\ / \ / / \
109+
A.lb B.lb A.lb B.lb
110+
```
111+
112+
113+
### Option 2: Relate UB/LB
114+
115+
Another option is to keep A and B as distinct variables but set their
116+
bounds in such a way that, whatever happens, we know that A <: B will hold.
117+
This can be achieved by ensuring that A.ub <: B.lb. In practice there
118+
are two ways to do that, depicted pictorially here:
119+
120+
```text
121+
Before Option #1 Option #2
122+
123+
B.ub B.ub B.ub
124+
/\ / \ / \
125+
A.ub / \ A.ub /(B')\ A.ub /(B')\
126+
/ \ / \ \ / / \ / /
127+
/ X \ __UB____/ UB /
128+
/ / \ \ / | | /
129+
/ / / \ / | | /
130+
\ \ / / /(A')| | /
131+
\ X / / LB ______LB/
132+
\ / \ / / / \ / (A')/ \
133+
\ / \ / \ / \ \ / \
134+
A.lb B.lb A.lb B.lb A.lb B.lb
135+
```
136+
137+
In these diagrams, UB and LB are defined as before. As you can see,
138+
the new ranges `A'` and `B'` are quite different from the range that
139+
would be produced by unifying the variables.
140+
141+
### What we do now
142+
143+
Our current technique is to *try* (transactionally) to relate the
144+
existing bounds of A and B, if there are any (i.e., if `UB(A) != top
145+
&& LB(B) != bot`). If that succeeds, we're done. If it fails, then
146+
we merge A and B into same variable.
147+
148+
This is not clearly the correct course. For example, if `UB(A) !=
149+
top` but `LB(B) == bot`, we could conceivably set `LB(B)` to `UB(A)`
150+
and leave the variables unmerged. This is sometimes the better
151+
course, it depends on the program.
152+
153+
The main case which fails today that I would like to support is:
154+
155+
```text
156+
fn foo<T>(x: T, y: T) { ... }
157+
158+
fn bar() {
159+
let x: @mut int = @mut 3;
160+
let y: @int = @3;
161+
foo(x, y);
162+
}
163+
```
164+
165+
In principle, the inferencer ought to find that the parameter `T` to
166+
`foo(x, y)` is `@const int`. Today, however, it does not; this is
167+
because the type variable `T` is merged with the type variable for
168+
`X`, and thus inherits its UB/LB of `@mut int`. This leaves no
169+
flexibility for `T` to later adjust to accommodate `@int`.
170+
171+
### What to do when not all bounds are present
172+
173+
In the prior discussion we assumed that A.ub was not top and B.lb was
174+
not bot. Unfortunately this is rarely the case. Often type variables
175+
have "lopsided" bounds. For example, if a variable in the program has
176+
been initialized but has not been used, then its corresponding type
177+
variable will have a lower bound but no upper bound. When that
178+
variable is then used, we would like to know its upper bound---but we
179+
don't have one! In this case we'll do different things depending on
180+
how the variable is being used.
181+
182+
## Transactional support
183+
184+
Whenever we adjust merge variables or adjust their bounds, we always
185+
keep a record of the old value. This allows the changes to be undone.
186+
187+
## Regions
188+
189+
I've only talked about type variables here, but region variables
190+
follow the same principle. They have upper- and lower-bounds. A
191+
region A is a subregion of a region B if A being valid implies that B
192+
is valid. This basically corresponds to the block nesting structure:
193+
the regions for outer block scopes are superregions of those for inner
194+
block scopes.
195+
196+
## Integral and floating-point type variables
197+
198+
There is a third variety of type variable that we use only for
199+
inferring the types of unsuffixed integer literals. Integral type
200+
variables differ from general-purpose type variables in that there's
201+
no subtyping relationship among the various integral types, so instead
202+
of associating each variable with an upper and lower bound, we just
203+
use simple unification. Each integer variable is associated with at
204+
most one integer type. Floating point types are handled similarly to
205+
integral types.
206+
207+
## GLB/LUB
208+
209+
Computing the greatest-lower-bound and least-upper-bound of two
210+
types/regions is generally straightforward except when type variables
211+
are involved. In that case, we follow a similar "try to use the bounds
212+
when possible but otherwise merge the variables" strategy. In other
213+
words, `GLB(A, B)` where `A` and `B` are variables will often result
214+
in `A` and `B` being merged and the result being `A`.
215+
216+
## Type coercion
217+
218+
We have a notion of assignability which differs somewhat from
219+
subtyping; in particular it may cause region borrowing to occur. See
220+
the big comment later in this file on Type Coercion for specifics.
221+
222+
### In conclusion
223+
224+
I showed you three ways to relate `A` and `B`. There are also more,
225+
of course, though I'm not sure if there are any more sensible options.
226+
The main point is that there are various options, each of which
227+
produce a distinct range of types for `A` and `B`. Depending on what
228+
the correct values for A and B are, one of these options will be the
229+
right choice: but of course we don't know the right values for A and B
230+
yet, that's what we're trying to find! In our code, we opt to unify
231+
(Option #1).
232+
233+
# Implementation details
234+
235+
We make use of a trait-like implementation strategy to consolidate
236+
duplicated code between subtypes, GLB, and LUB computations. See the
237+
section on "Type Combining" below for details.

0 commit comments

Comments
 (0)