Skip to content

Latest commit

 

History

History
107 lines (84 loc) · 4.79 KB

early-late-bound.md

File metadata and controls

107 lines (84 loc) · 4.79 KB

Early and Late Bound Variables

In Rust, item definitions (like fn) can often have generic parameters, which are always universally quantified. That is, if you have a function like

fn foo<T>(x: T) { }

this function is defined "for all T" (not "for some specific T", which would be existentially quantified).

While Rust items can be quantified over types, lifetimes, and constants, the types of values in Rust are only ever quantified over lifetimes. So you can have a type like for<'a> fn(&'a u32), which represents a function pointer that takes a reference with any lifetime, or for<'a> dyn Trait<'a>, which is a dyn trait for a trait implemented for any lifetime; but we have no type like for<T> fn(T), which would be a function that takes a value of any type as a parameter. This is a consequence of monomorphization -- to support a value of type for<T> fn(T), we would need a single function pointer that can be used for a parameter of any type, but in Rust we generate customized code for each parameter type.

One consequence of this asymmetry is a weird split in how we represent some generic types: early- and late- bound parameters. Basically, if we cannot represent a type (e.g. a universally quantified type), we have to bind it early so that the unrepresentable type is never around.

Consider the following example:

fn foo<'a, 'b, T>(x: &'a u32, y: &'b T) where T: 'b { ... }

We cannot treat 'a, 'b, and T in the same way. Types in Rust can't have for<T> { .. }, only for<'a> {...}, so whenever you reference foo the type you get back can't be for<'a, 'b, T> fn(&'a u32, y: &'b T). Instead, the T must be substituted early. In particular, you have:

let x = foo; // T, 'b have to be substituted here
x(...);      // 'a substituted here, at the point of call
x(...);      // 'a substituted here with a different value

Early-bound parameters

Early-bound parameters in rustc are identified by an index, stored in the ParamTy struct for types or the EarlyBoundRegion struct for lifetimes. The index counts from the outermost declaration in scope. This means that as you add more binders inside, the index doesn't change.

For example,

trait Foo<T> {
  type Bar<U> = (Self, T, U);
}

Here, the type (Self, T, U) would be ($0, $1, $2), where $N means a ParamTy with the index of N.

In rustc, the Generics structure carries this information. So the Generics for Bar above would be just like for U and would indicate the 'parent' generics of Foo, which declares Self and T. You can read more in this chapter.

Late-bound parameters

Late-bound parameters in rustc are handled quite differently (they are also specialized to lifetimes since, right now, only late-bound lifetimes are supported, though with GATs that has to change). We indicate their potential presence by a Binder type. The Binder doesn't know how many variables there are at that binding level. This can only be determined by walking the type itself and collecting them. So a type like for<'a, 'b> ('a, 'b) would be for (^0.a, ^0.b). Here, we just write for because we don't know the names of the things bound within.

Moreover, a reference to a late-bound lifetime is written ^0.a:

  • The 0 is the index; it identifies that this lifetime is bound in the innermost binder (the for).
  • The a is the "name"; late-bound lifetimes in rustc are identified by a "name" -- the BoundRegionKind enum. This enum can contain a DefId or it might have various "anonymous" numbered names. The latter arise from types like fn(&u32, &u32), which are equivalent to something like for<'a, 'b> fn(&'a u32, &'b u32), but the names of those lifetimes must be generated.

This setup of not knowing the full set of variables at a binding level has some advantages and some disadvantages. The disadvantage is that you must walk the type to find out what is bound at the given level and so forth. The advantage is primarily that, when constructing types from Rust syntax, if we encounter anonymous regions like in fn(&u32), we just create a fresh index and don't have to update the binder.