This subchapter is about the bootstrapping process.
Bootstrapping is the process of using a compiler to compile itself. More accurately, it means using an older compiler to compile a newer version of the same compiler.
This raises a chicken-and-egg paradox: where did the first compiler come from? It must have been written in a different language. In Rust's case it was written in OCaml. However it was abandoned long ago and the only way to build a modern version of rustc is a slightly less modern version.
This is exactly how x.py
works: it downloads the current beta release of
rustc, then uses it to compile the new compiler.
Compiling rustc
is done in stages:
- Stage 0: the stage0 compiler is usually (you can configure
x.py
to use something else) the current betarustc
compiler and its associated dynamic libraries (whichx.py
will download for you). This stage0 compiler is then used only to compilerustbuild
,std
, andrustc
. When compilingrustc
, this stage0 compiler uses the freshly compiledstd
. There are two concepts at play here: a compiler (with its set of dependencies) and its 'target' or 'object' libraries (std
andrustc
). Both are staged, but in a staggered manner. - Stage 1: the code in your clone (for new version) is then
compiled with the stage0 compiler to produce the stage1 compiler.
However, it was built with an older compiler (stage0), so to
optimize the stage1 compiler we go to next the stage.
- In theory, the stage1 compiler is functionally identical to the
stage2 compiler, but in practice there are subtle differences. In
particular, the stage1 compiler itself was built by stage0 and
hence not by the source in your working directory: this means that
the symbol names used in the compiler source may not match the
symbol names that would have been made by the stage1 compiler. This is
important when using dynamic linking and the lack of ABI compatibility
between versions. This primarily manifests when tests try to link with any
of the
rustc_*
crates or use the (now deprecated) plugin infrastructure. These tests are marked withignore-stage1
.
- In theory, the stage1 compiler is functionally identical to the
stage2 compiler, but in practice there are subtle differences. In
particular, the stage1 compiler itself was built by stage0 and
hence not by the source in your working directory: this means that
the symbol names used in the compiler source may not match the
symbol names that would have been made by the stage1 compiler. This is
important when using dynamic linking and the lack of ABI compatibility
between versions. This primarily manifests when tests try to link with any
of the
- Stage 2: we rebuild our stage1 compiler with itself to produce the stage2 compiler (i.e. it builds itself) to have all the latest optimizations. (By default, we copy the stage1 libraries for use by the stage2 compiler, since they ought to be identical.)
- (Optional) Stage 3: to sanity check our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.
The stage2
compiler is the one distributed with rustup
and all other
install methods. However, it takes a very long time to build because one must
first build the new compiler with an older compiler and then use that to
build the new compiler with itself. For development, you usually only want
the stage1
compiler: x.py build library/std
.
x.py
tries to be helpful and pick the stage you most likely meant for each subcommand.
These defaults are as follows:
doc
:--stage 0
build
:--stage 1
test
:--stage 1
dist
:--stage 2
install
:--stage 2
bench
:--stage 2
You can always override the stage by passing --stage N
explicitly.
For more information about stages, see below.
Since the build system uses the current beta compiler to build the stage-1
bootstrapping compiler, the compiler source code can't use some features
until they reach beta (because otherwise the beta compiler doesn't support
them). On the other hand, for compiler intrinsics and internal
features, the features have to be used. Additionally, the compiler makes
heavy use of nightly features (#![feature(...)]
). How can we resolve this
problem?
There are two methods used:
- The build system sets
--cfg bootstrap
when building withstage0
, so we can usecfg(not(bootstrap))
to only use features when built withstage1
. This is useful for e.g. features that were just stabilized, which require#![feature(...)]
when built withstage0
, but not forstage1
. - The build system sets
RUSTC_BOOTSTRAP=1
. This special variable means to break the stability guarantees of rust: Allow using#![feature(...)]
with a compiler that's not nightly. This should never be used except when bootstrapping the compiler.
When you use the bootstrap system, you'll call it through x.py
.
However, most of the code lives in src/bootstrap
.
bootstrap
has a difficult problem: it is written in Rust, but yet it is run
before the rust compiler is built! To work around this, there are two
components of bootstrap: the main one written in rust, and bootstrap.py
.
bootstrap.py
is what gets run by x.py. It takes care of downloading the
stage0
compiler, which will then build the bootstrap binary written in
Rust.
Because there are two separate codebases behind x.py
, they need to
be kept in sync. In particular, both bootstrap.py
and the bootstrap binary
parse config.toml
and read the same command line arguments. bootstrap.py
keeps these in sync by setting various environment variables, and the
programs sometimes have to add arguments that are explicitly ignored, to be
read by the other.
This section is a work in progress. In the meantime, you can see an example contribution here.
This is a detailed look into the separate bootstrap stages.
The convention x.py
uses is that:
- A
--stage N
flag means to run the stage N compiler (stageN/rustc
). - A "stage N artifact" is a build artifact that is produced by the stage N compiler.
- The "stage (N+1) compiler" is assembled from "stage N artifacts". This process is called uplifting.
Anything you can build with x.py
is a build artifact.
Build artifacts include, but are not limited to:
- binaries, like
stage0-rustc/rustc-main
- shared objects, like
stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so
- rlib files, like
stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib
- HTML files generated by rustdoc, like
doc/std
x.py build --stage 0
means to build with the betarustc
.x.py doc --stage 0
means to document using the betarustdoc
.x.py test --stage 0 library/std
means to run tests on the standard library without buildingrustc
from source ('build with stage 0, then test the artifacts'). If you're working on the standard library, this is normally the test command you want.x.py test src/test/ui
means to build the stage 1 compiler and runcompiletest
on it. If you're working on the compiler, this is normally the test command you want.
x.py test --stage 0 src/test/ui
is not meaningful: it runs tests on the beta compiler and doesn't buildrustc
from source. Usetest src/test/ui
instead, which builds stage 1 from source.x.py test --stage 0 compiler/rustc
builds the compiler but runs no tests: it's runningcargo test -p rustc
, but cargo doesn't understand Rust's tests. You shouldn't need to use this, usetest
instead (without arguments).x.py build --stage 0 compiler/rustc
builds the compiler, but does not make it usable: the build artifacts are not uplifted (#73519). Usex.py build library/std
instead which puts the compiler instage1/rustc
.
Note that build --stage N compiler/rustc
does not build the stage N compiler:
instead it builds the stage N+1 compiler using the stage N compiler.
In short, stage 0 uses the stage0 compiler to create stage0 artifacts which will later be uplifted to be the stage1 compiler.
In each stage, two major steps are performed:
std
is compiled by the stage N compiler.- That
std
is linked to programs built by the stage N compiler, including the stage N artifacts (stage (N+1) compiler).
This is somewhat intuitive if one thinks of the stage N artifacts as "just"
another program we are building with the stage N compiler:
build --stage N compiler/rustc
is linking the stage N artifacts to the std
built by the stage N compiler.
Here is a chart of a full build using x.py
:
Keep in mind this diagram is a simplification, i.e. rustdoc
can be built at
different stages, the process is a bit different when passing flags such as
--keep-stage
, or if there are non-host targets.
The stage 2 compiler is what is shipped to end-users.
Note that there are two std
libraries in play here:
- The library linked to
stageN/rustc
, which was built by stage N-1 (stage N-1std
) - The library used to compile programs with
stageN/rustc
, which was built by stage N (stage Nstd
).
Stage N std
is pretty much necessary for any useful work with the stage N compiler.
Without it, you can only compile programs with #![no_core]
-- not terribly useful!
The reason these need to be different is because they aren't necessarily ABI-compatible: there could be a new layout optimizations, changes to MIR, or other changes to Rust metadata on nightly that aren't present in beta.
This is also where --keep-stage 1 library/std
comes into play. Since most
changes to the compiler don't actually change the ABI, once you've produced a
std
in stage 1, you can probably just reuse it with a different compiler.
If the ABI hasn't changed, you're good to go, no need to spend time
recompiling that std
.
--keep-stage
simply assumes the previous compile is fine and copies those
artifacts into the appropriate place, skipping the cargo invocation.
Building stage2 std
is different depending on whether you are cross-compiling or not
(see in the table how stage2 only builds non-host std
targets).
This is because x.py
uses a trick: if HOST
and TARGET
are the same,
it will reuse stage1 std
for stage2! This is sound because stage1 std
was compiled with the stage1 compiler, i.e. a compiler using the source code
you currently have checked out. So it should be identical (and therefore ABI-compatible)
to the std
that stage2/rustc
would compile.
However, when cross-compiling, stage1 std
will only run on the host.
So the stage2 compiler has to recompile std
for the target.
The rustc
generated by the stage0 compiler is linked to the freshly-built
std
, which means that for the most part only std
needs to be cfg-gated,
so that rustc
can use features added to std immediately after their addition,
without need for them to get into the downloaded beta.
Note this is different from any other Rust program: stage1 rustc
is built by the beta compiler, but using the master version of libstd!
The only time rustc
uses cfg(bootstrap)
is when it adds internal lints
that use diagnostic items. This happens very rarely.
The following tables indicate the outputs of various stage actions:
Stage 0 Action | Output |
---|---|
beta extracted |
build/HOST/stage0 |
stage0 builds bootstrap |
build/bootstrap |
stage0 builds test /std |
build/HOST/stage0-std/TARGET |
copy stage0-std (HOST only) |
build/HOST/stage0-sysroot/lib/rustlib/HOST |
stage0 builds rustc with stage0-sysroot |
build/HOST/stage0-rustc/HOST |
copy stage0-rustc (except executable) |
build/HOST/stage0-sysroot/lib/rustlib/HOST |
build llvm |
build/HOST/llvm |
stage0 builds codegen with stage0-sysroot |
build/HOST/stage0-codegen/HOST |
stage0 builds rustdoc , clippy , miri , with stage0-sysroot |
build/HOST/stage0-tools/HOST |
--stage=0
stops here.
Stage 1 Action | Output |
---|---|
copy (uplift) stage0-rustc executable to stage1 |
build/HOST/stage1/bin |
copy (uplift) stage0-codegen to stage1 |
build/HOST/stage1/lib |
copy (uplift) stage0-sysroot to stage1 |
build/HOST/stage1/lib |
stage1 builds test /std |
build/HOST/stage1-std/TARGET |
copy stage1-std (HOST only) |
build/HOST/stage1/lib/rustlib/HOST |
stage1 builds rustc |
build/HOST/stage1-rustc/HOST |
copy stage1-rustc (except executable) |
build/HOST/stage1/lib/rustlib/HOST |
stage1 builds codegen |
build/HOST/stage1-codegen/HOST |
--stage=1
stops here.
Stage 2 Action | Output |
---|---|
copy (uplift) stage1-rustc executable |
build/HOST/stage2/bin |
copy (uplift) stage1-sysroot |
build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST |
stage2 builds test /std (not HOST targets) |
build/HOST/stage2-std/TARGET |
copy stage2-std (not HOST targets) |
build/HOST/stage2/lib/rustlib/TARGET |
stage2 builds rustdoc , clippy , miri |
build/HOST/stage2-tools/HOST |
copy rustdoc |
build/HOST/stage2/bin |
--stage=2
stops here.
x.py
allows you to pass stage-specific flags to rustc
when bootstrapping.
The RUSTFLAGS_STAGE_0
, RUSTFLAGS_STAGE_1
and RUSTFLAGS_STAGE_2
environment variables pass the given flags when building stage 0, 1, and 2
artifacts respectively.
Additionally, the RUSTFLAGS_STAGE_NOT_0
variable, as its name suggests, passes
the given arguments if the stage is not 0.
During bootstrapping, there are a bunch of compiler-internal environment
variables that are used. If you are trying to run an intermediate version of
rustc
, sometimes you may need to set some of these environment variables
manually. Otherwise, you get an error like the following:
thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5
If ./stageN/bin/rustc
gives an error about environment variables, that
usually means something is quite wrong -- or you're trying to compile e.g.
rustc
or std
or something that depends on environment variables. In
the unlikely case that you actually need to invoke rustc in such a situation,
you can find the environment variable values by adding the following flag to
your x.py
command: --on-fail=print-env
.