Bootstrapping is the process of using a compiler to produce a later version of itself.
This raises a chicken-or-the-egg
paradox: what rust compiler
was used to produce the very first rust compiler? The answer is that the first
compiler was not written in rust. It was written in OCaml. Of
course, it has long been discarded and since then the only compiler that is
able to produce some version of rustc
is a slightly earlier version of
rustc
.
For this purpose a python script x.py
is provided at the root of the
repository. x.py
downloads a pre-compiled compiler—the stage 0 compiler—and
with it produces from the current source code a compiler—the stage 1 compiler.
Additionaly, it may use the stage 1 compiler to produce from the current source
code another compiler—the stage 2 compiler. Below describes this process in
some detail, including the reason for a stage 2 compiler and more.
Each stage involves:
- An existing compiler and its set of dependencies.
- Objects:
std
andrustc
.
Note: the compiler of a stage—e.g. "the stage 1 compiler"—refers to the compiler that is produced at that stage, not the one that already exists.
Typically, in the first stage (stage 0) the compiler is obtained by downloading a pre-compiled one and in following stages the compiler is the one that was produced in the previous stage.
Here's a diagram, adapted from Joshua Nelson's talk on bootstrapping at RustConf 2022, with detailed explanations below.
The A
, B
, C
, and D
show the ordering of the stages of bootstrapping.
Blue nodes are downloaded,
yellow nodes are built with the
stage0 compiler, and
green nodes are built with the
stage1 compiler.
graph TD
s0c["stage0 compiler (1.63)"]:::downloaded -->|A| s0l("stage0 std (1.64)"):::with-s0c;
s0c & s0l --- stepb[ ]:::empty;
stepb -->|B| s0ca["stage0 compiler artifacts (1.64)"]:::with-s0c;
s0ca -->|copy| s1c["stage1 compiler (1.64)"]:::with-s0c;
s1c -->|C| s1l("stage1 std (1.64)"):::with-s1c;
s1c & s1l --- stepd[ ]:::empty;
stepd -->|D| s1ca["stage1 compiler artifacts (1.64)"]:::with-s1c;
s1ca -->|copy| s2c["stage2 compiler"]:::with-s1c;
classDef empty width:0px,height:0px;
classDef downloaded fill: lightblue;
classDef with-s0c fill: yellow;
classDef with-s1c fill: lightgreen;
A pre-compiled compiler and its set of dependencies are downloaded. By default, it is the current beta release. This is the stage 0 compiler.
The stage 0 compiler produces from current code src/bootstrap
and std
and uses
them to produce from current code a compiler. This is the stage 1 compiler.
The stage 1 compiler is the first that is from current code. Yet, it is not entirely up-to-date, because the compiler that produced it is of earlier code. More on this below.
By default, the stage 1 libraries are copied into stage 2, because they are expected to be identical.
The stage 1 compiler is used to produce from current code a compiler. This is the stage 2 compiler.
The stage 2 compiler is the first that is both from current code and produced
by a compiler that is of current code. The compilers and libraries obtained by
rustup
and other installation methods are all stage 2.
For most purposes a stage 1 compiler would suffice: x.py build library
.
See Building the Compiler.
Between the stage 2 and the stage 1 compiler are subtle differences:
-
The symbol names used in the compiler source may not match the symbol names that would have been made by the stage1 compiler. This is important when using dynamic linking and due to the lack of ABI compatibility between versions. This primarily manifests when tests try to link with any of the
rustc_*
crates or use the (now deprecated) plugin infrastructure. These tests are marked withignore-stage1
. -
The stage 2 compiler benefits from the compile-time optimizations produces by a compiler that is of the current code.
If a verification that the stage 2 libraries that were copied from stage 1 are indeed identical to those which would otherwise have been produced in stage 2 is necessary, the stage 2 compiler is used to produce them and a comparison is made.
x.py
provides a reasonable default stage for each subcommand:
check
:--stage 0
doc
:--stage 0
build
:--stage 1
test
:--stage 1
dist
:--stage 2
install
:--stage 2
bench
:--stage 2
Of course, these can be overridden by passing --stage <number>
.
For more information about stages, see below.
Since the build system uses the current beta compiler to build the stage-1
bootstrapping compiler, the compiler source code can't use some features
until they reach beta (because otherwise the beta compiler doesn't support
them). On the other hand, for compiler intrinsics and internal
features, the features have to be used. Additionally, the compiler makes
heavy use of nightly features (#![feature(...)]
). How can we resolve this
problem?
There are two methods used:
- The build system sets
--cfg bootstrap
when building withstage0
, so we can usecfg(not(bootstrap))
to only use features when built withstage1
. This is useful for e.g. features that were just stabilized, which require#![feature(...)]
when built withstage0
, but not forstage1
. - The build system sets
RUSTC_BOOTSTRAP=1
. This special variable means to break the stability guarantees of rust: Allow using#![feature(...)]
with a compiler that's not nightly. This should never be used except when bootstrapping the compiler.
When you use the bootstrap system, you'll call it through x.py
.
However, most of the code lives in src/bootstrap
.
bootstrap
has a difficult problem: it is written in Rust, but yet it is run
before the Rust compiler is built! To work around this, there are two
components of bootstrap: the main one written in rust, and bootstrap.py
.
bootstrap.py
is what gets run by x.py
. It takes care of downloading the
stage0
compiler, which will then build the bootstrap binary written in
Rust.
Because there are two separate codebases behind x.py
, they need to
be kept in sync. In particular, both bootstrap.py
and the bootstrap binary
parse config.toml
and read the same command line arguments. bootstrap.py
keeps these in sync by setting various environment variables, and the
programs sometimes have to add arguments that are explicitly ignored, to be
read by the other.
This section is a work in progress. In the meantime, you can see an example contribution here.
This is a detailed look into the separate bootstrap stages.
The convention x.py
uses is that:
- A
--stage N
flag means to run the stage N compiler (stageN/rustc
). - A "stage N artifact" is a build artifact that is produced by the stage N compiler.
- The stage N+1 compiler is assembled from stage N artifacts. This process is called uplifting.
Anything you can build with x.py
is a build artifact.
Build artifacts include, but are not limited to:
- binaries, like
stage0-rustc/rustc-main
- shared objects, like
stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so
- rlib files, like
stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib
- HTML files generated by rustdoc, like
doc/std
./x.py build --stage 0
means to build with the betarustc
../x.py doc --stage 0
means to document using the betarustdoc
../x.py test --stage 0 library/std
means to run tests on the standard library without buildingrustc
from source ('build with stage 0, then test the artifacts'). If you're working on the standard library, this is normally the test command you want../x.py test src/test/ui
means to build the stage 1 compiler and runcompiletest
on it. If you're working on the compiler, this is normally the test command you want.
./x.py test --stage 0 src/test/ui
is not useful: it runs tests on the beta compiler and doesn't buildrustc
from source. Usetest src/test/ui
instead, which builds stage 1 from source../x.py test --stage 0 compiler/rustc
builds the compiler but runs no tests: it's runningcargo test -p rustc
, but cargo doesn't understand Rust's tests. You shouldn't need to use this, usetest
instead (without arguments)../x.py build --stage 0 compiler/rustc
builds the compiler, but does not build libstd or even libcore. Most of the time, you'll want./x.py build library
instead, which allows compiling programs without needing to define lang items.
Note that build --stage N compiler/rustc
does not build the stage N compiler:
instead it builds the stage N+1 compiler using the stage N compiler.
In short, stage 0 uses the stage0 compiler to create stage0 artifacts which will later be uplifted to be the stage1 compiler.
In each stage, two major steps are performed:
std
is compiled by the stage N compiler.- That
std
is linked to programs built by the stage N compiler, including the stage N artifacts (stage N+1 compiler).
This is somewhat intuitive if one thinks of the stage N artifacts as "just"
another program we are building with the stage N compiler:
build --stage N compiler/rustc
is linking the stage N artifacts to the std
built by the stage N compiler.
Note that there are two std
libraries in play here:
- The library linked to
stageN/rustc
, which was built by stage N-1 (stage N-1std
) - The library used to compile programs with
stageN/rustc
, which was built by stage N (stage Nstd
).
Stage N std
is pretty much necessary for any useful work with the stage N compiler.
Without it, you can only compile programs with #![no_core]
-- not terribly useful!
The reason these need to be different is because they aren't necessarily ABI-compatible: there could be new layout optimizations, changes to MIR, or other changes to Rust metadata on nightly that aren't present in beta.
This is also where --keep-stage 1 library/std
comes into play. Since most
changes to the compiler don't actually change the ABI, once you've produced a
std
in stage 1, you can probably just reuse it with a different compiler.
If the ABI hasn't changed, you're good to go, no need to spend time
recompiling that std
.
--keep-stage
simply assumes the previous compile is fine and copies those
artifacts into the appropriate place, skipping the cargo invocation.
Cross-compiling is the process of compiling code that will run on another architecture.
For instance, you might want to build an ARM version of rustc using an x86 machine.
Building stage2 std
is different when you are cross-compiling.
This is because x.py
uses a trick: if HOST
and TARGET
are the same,
it will reuse stage1 std
for stage2! This is sound because stage1 std
was compiled with the stage1 compiler, i.e. a compiler using the source code
you currently have checked out. So it should be identical (and therefore ABI-compatible)
to the std
that stage2/rustc
would compile.
However, when cross-compiling, stage1 std
will only run on the host.
So the stage2 compiler has to recompile std
for the target.
(See in the table how stage2 only builds non-host std
targets).
The rustc
generated by the stage0 compiler is linked to the freshly-built
std
, which means that for the most part only std
needs to be cfg-gated,
so that rustc
can use features added to std immediately after their addition,
without need for them to get into the downloaded beta.
Note this is different from any other Rust program: stage1 rustc
is built by the beta compiler, but using the master version of libstd!
The only time rustc
uses cfg(bootstrap)
is when it adds internal lints
that use diagnostic items. This happens very rarely.
When you build a project with cargo, the build artifacts for dependencies
are normally stored in target/debug/deps
. This only contains dependencies cargo
knows about; in particular, it doesn't have the standard library. Where do
std
or proc_macro
come from? It comes from the sysroot, the root
of a number of directories where the compiler loads build artifacts at runtime.
The sysroot doesn't just store the standard library, though - it includes
anything that needs to be loaded at runtime. That includes (but is not limited
to):
libstd
/libtest
/libproc_macro
- The compiler crates themselves, when using
rustc_private
. In-tree these are always present; out of tree, you need to installrustc-dev
with rustup. libLLVM.so
, the shared object file for the LLVM project. In-tree this is either built from source or downloaded from CI; out-of-tree, you need to installllvm-tools-preview
with rustup.
All the artifacts listed so far are compiler runtime dependencies. You can
see them with rustc --print sysroot
:
$ ls $(rustc --print sysroot)/lib
libchalk_derive-0685d79833dc9b2b.so libstd-25c6acf8063a3802.so
libLLVM-11-rust-1.50.0-nightly.so libtest-57470d2aa8f7aa83.so
librustc_driver-4f0cc9f50e53f0ba.so libtracing_attributes-e4be92c35ab2a33b.so
librustc_macros-5f0ec4a119c6ac86.so rustlib
There are also runtime dependencies for the standard library! These are in
lib/rustlib
, not lib/
directly.
$ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5
libaddr2line-6c8e02b8fedc1e5f.rlib
libadler-9ef2480568df55af.rlib
liballoc-9c4002b5f79ba0e1.rlib
libcfg_if-512eb53291f6de7e.rlib
libcompiler_builtins-ef2408da76957905.rlib
rustlib
includes libraries like hashbrown
and cfg_if
, which are not part
of the public API of the standard library, but are used to implement it.
rustlib
is part of the search path for linkers, but lib
will never be part
of the search path.
Since rustlib
is part of the search path, it means we have to be careful
about which crates are included in it. In particular, all crates except for
the standard library are built with the flag -Z force-unstable-if-unmarked
,
which means that you have to use #![feature(rustc_private)]
in order to
load it (as opposed to the standard library, which is always available).
The -Z force-unstable-if-unmarked
flag has a variety of purposes to help
enforce that the correct crates are marked as unstable. It was introduced
primarily to allow rustc and the standard library to link to arbitrary crates
on crates.io which do not themselves use staged_api
. rustc
also relies on
this flag to mark all of its crates as unstable with the rustc_private
feature so that each crate does not need to be carefully marked with
unstable
.
This flag is automatically applied to all of rustc
and the standard library
by the bootstrap scripts. This is needed because the compiler and all of its
dependencies are shipped in the sysroot to all users.
This flag has the following effects:
- Marks the crate as "unstable" with the
rustc_private
feature if it is not itself marked as stable or unstable. - Allows these crates to access other forced-unstable crates without any need
for attributes. Normally a crate would need a
#![feature(rustc_private)]
attribute to use other unstable crates. However, that would make it impossible for a crate from crates.io to access its own dependencies since that crate won't have afeature(rustc_private)
attribute, but everything is compiled with-Z force-unstable-if-unmarked
.
Code which does not use -Z force-unstable-if-unmarked
should include the
#![feature(rustc_private)]
crate attribute to access these force-unstable
crates. This is needed for things that link rustc
, such as miri
or
clippy
.
You can find more discussion about sysroots in:
- The rustdoc PR explaining why it uses
extern crate
for dependencies loaded from sysroot - Discussions about sysroot on Zulip
- Discussions about building rustdoc out of tree
x.py
allows you to pass stage-specific flags to rustc
and cargo
when bootstrapping.
The RUSTFLAGS_BOOTSTRAP
environment variable is passed as RUSTFLAGS
to the bootstrap stage
(stage0), and RUSTFLAGS_NOT_BOOTSTRAP
is passed when building artifacts for later stages.
RUSTFLAGS
will work, but also affects the build of bootstrap
itself, so it will be rare to want
to use it.
Finally, MAGIC_EXTRA_RUSTFLAGS
bypasses the cargo
cache to pass flags to rustc without
recompiling all dependencies.
RUSTDOCFLAGS
, RUSTDOCFLAGS_BOOTSTRAP
, and RUSTDOCFLAGS_NOT_BOOTSTRAP
are anologous to
RUSTFLAGS
, but for rustdoc.
CARGOFLAGS
will pass arguments to cargo itself (e.g. --timings
). CARGOFLAGS_BOOTSTRAP
and
CARGOFLAGS_NOT_BOOTSTRAP
work analogously to RUSTFLAGS_BOOTSTRAP
.
--test-args
will pass arguments through to the test runner. For src/test/ui
, this is
compiletest; for unit tests and doctests this is the libtest
runner. Most test runner accept
--help
, which you can use to find out the options accepted by the runner.
During bootstrapping, there are a bunch of compiler-internal environment
variables that are used. If you are trying to run an intermediate version of
rustc
, sometimes you may need to set some of these environment variables
manually. Otherwise, you get an error like the following:
thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5
If ./stageN/bin/rustc
gives an error about environment variables, that
usually means something is quite wrong -- or you're trying to compile e.g.
rustc
or std
or something that depends on environment variables. In
the unlikely case that you actually need to invoke rustc in such a situation,
you can tell the bootstrap shim to print all env variables by adding -vvv
to your x.py
command.
This is an incomplete reference for the outputs generated by bootstrap:
Stage 0 Action | Output |
---|---|
beta extracted |
build/HOST/stage0 |
stage0 builds bootstrap |
build/bootstrap |
stage0 builds test /std |
build/HOST/stage0-std/TARGET |
copy stage0-std (HOST only) |
build/HOST/stage0-sysroot/lib/rustlib/HOST |
stage0 builds rustc with stage0-sysroot |
build/HOST/stage0-rustc/HOST |
copy stage0-rustc (except executable) |
build/HOST/stage0-sysroot/lib/rustlib/HOST |
build llvm |
build/HOST/llvm |
stage0 builds codegen with stage0-sysroot |
build/HOST/stage0-codegen/HOST |
stage0 builds rustdoc , clippy , miri , with stage0-sysroot |
build/HOST/stage0-tools/HOST |
--stage=0
stops here.
Stage 1 Action | Output |
---|---|
copy (uplift) stage0-rustc executable to stage1 |
build/HOST/stage1/bin |
copy (uplift) stage0-codegen to stage1 |
build/HOST/stage1/lib |
copy (uplift) stage0-sysroot to stage1 |
build/HOST/stage1/lib |
stage1 builds test /std |
build/HOST/stage1-std/TARGET |
copy stage1-std (HOST only) |
build/HOST/stage1/lib/rustlib/HOST |
stage1 builds rustc |
build/HOST/stage1-rustc/HOST |
copy stage1-rustc (except executable) |
build/HOST/stage1/lib/rustlib/HOST |
stage1 builds codegen |
build/HOST/stage1-codegen/HOST |
--stage=1
stops here.
Stage 2 Action | Output |
---|---|
copy (uplift) stage1-rustc executable |
build/HOST/stage2/bin |
copy (uplift) stage1-sysroot |
build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST |
stage2 builds test /std (not HOST targets) |
build/HOST/stage2-std/TARGET |
copy stage2-std (not HOST targets) |
build/HOST/stage2/lib/rustlib/TARGET |
stage2 builds rustdoc , clippy , miri |
build/HOST/stage2-tools/HOST |
copy rustdoc |
build/HOST/stage2/bin |
--stage=2
stops here.