As of September 2021, The only stage of the compiler
that is already parallel is codegen. The nightly compiler implements query evaluation,
but there is still a lot of work to be done. The lack of parallelism at other stages
also represents an opportunity for improving compiler performance. One can try out the current
parallel compiler work by enabling it in the config.toml
.
These next few sections describe where and how parallelism is currently used,
and the current status of making parallel compilation the default in rustc
.
The underlying thread-safe data-structures used in the parallel compiler
can be found in the rustc_data_structures::sync
module. Some of these data structures
use the parking_lot
crate as well.
There are two underlying thread safe data structures used in code generation:
Lrc
MetadataRef
->OwningRef<Box<dyn Erased + Send + Sync>, [u8]>
- This data structure is specific to
rustc
.
- This data structure is specific to
During monomorphization the compiler splits up all the code to
be generated into smaller chunks called codegen units. These are then generated by
independent instances of LLVM running in parallel. At the end, the linker
is run to combine all the codegen units together into one binary. This process
occurs in the rustc_codegen_ssa::base
module.
The query model has some properties that make it actually feasible to evaluate multiple queries in parallel without too much of an effort:
- All data a query provider can access is accessed via the query context, so the query context can take care of synchronizing access.
- Query results are required to be immutable so they can safely be used by different threads concurrently.
When a query foo
is evaluated, the cache table for foo
is locked.
- If there already is a result, we can clone it, release the lock and we are done.
- If there is no cache entry and no other active query invocation computing the same result, we mark the key as being "in progress", release the lock and start evaluating.
- If there is another query invocation for the same key in progress, we release the lock, and just block the thread until the other invocation has computed the result we are waiting for. This cannot deadlock because, as mentioned before, query invocations form a DAG. Some thread will always make progress.
As of September 2021, there are still a number of steps to complete before rustdoc rendering can be made parallel. More details on this issue can be found here.
As of July 2021, work on explicitly parallelizing the compiler has stalled. There is a lot of design and correctness work that needs to be done.
These are the basic ideas in the effort to make rustc
parallel:
- There are a lot of loops in the compiler that just iterate over all items in a crate. These can possibly be parallelized.
- We can use (a custom fork of)
rayon
to run tasks in parallel. The custom fork allows the execution of DAGs of tasks, not just trees. - There are currently a lot of global data structures that need to be made
thread-safe. A key strategy here has been converting interior-mutable
data-structures (e.g.
Cell
) into their thread-safe siblings (e.g.Mutex
).
As of February 2021, much of this effort is on hold due to lack of manpower. We have a working prototype with promising performance gains in many cases. However, there are two blockers:
-
It's not clear what invariants need to be upheld that might not hold in the face of concurrency. An auditing effort was underway, but seems to have stalled at some point.
-
There is a lot of lock contention, which actually degrades performance as the number of threads increases beyond 4.
Here are some resources that can be used to learn more (note that some of them are a bit out of date):