diff --git a/src/SUMMARY.md b/src/SUMMARY.md index d4ee59abf..eb889d36b 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -9,6 +9,7 @@ - [Using `compiletest` + commands to control test execution](./compiletest.md) - [Walkthrough: a typical contribution](./walkthrough.md) - [High-level overview of the compiler source](./high-level-overview.md) +- [The Rustc Driver](./rustc-driver.md) - [Queries: demand-driven compilation](./query.md) - [Incremental compilation](./incremental-compilation.md) - [The parser](./the-parser.md) @@ -45,6 +46,10 @@ - [miri const evaluator](./miri.md) - [Parameter Environments](./param_env.md) - [Generating LLVM IR](./trans.md) -- [Background material](./background.md) -- [Glossary](./glossary.md) -- [Code Index](./code-index.md) + +--- + +- [Appendix A: Stupid Stats](./appendix-stupid-stats.md) +- [Appendix B: Background material](./appendix-background.md) +- [Appendix C: Glossary](./appendix-glossary.md) +- [Appendix D: Code Index](./appendix-code-index.md) diff --git a/src/background.md b/src/appendix-background.md similarity index 99% rename from src/background.md rename to src/appendix-background.md index 50c247774..c69e7d93d 100644 --- a/src/background.md +++ b/src/appendix-background.md @@ -1,4 +1,4 @@ -# Background topics +# Appendix B: Background topics This section covers a numbers of common compiler terms that arise in this guide. We try to give the general definition while providing some diff --git a/src/appendix-code-index.md b/src/appendix-code-index.md new file mode 100644 index 000000000..f6dcb9c37 --- /dev/null +++ b/src/appendix-code-index.md @@ -0,0 +1,22 @@ +# Appendix D: Code Index + +rustc has a lot of important data structures. This is an attempt to give some +guidance on where to learn more about some of the key data structures of the +compiler. + +Item | Kind | Short description | Chapter | Declaration +----------------|----------|-----------------------------|--------------------|------------------- +`CodeMap` | struct | The CodeMap maps the AST nodes to their source code | [The parser] | [src/libsyntax/codemap.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs) +`CompileState` | struct | State that is passed to a callback at each compiler pass | [The Rustc Driver] | [src/librustc_driver/driver.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_driver/driver.rs) +`ast::Crate` | struct | Syntax-level representation of a parsed crate | [The parser] | [src/librustc/hir/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/ast.rs) +`hir::Crate` | struct | More abstract, compiler-friendly form of a crate's AST | [The Hir] | [src/librustc/hir/mod.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/hir/mod.rs) +`ParseSess` | struct | This struct contains information about a parsing session | [the Parser] | [src/libsyntax/parse/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/mod.rs) +`Session` | struct | The data associated with a compilation session | [the Parser], [The Rustc Driver] | [src/librustc/session/mod.html](https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs) +`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser] | [src/libsyntax/parse/lexer/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/lexer/mod.rs) +`TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules] | [src/librustc/ty/trait_def.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/trait_def.rs) +`TyCtxt<'cx, 'tcx, 'tcx>` | type | The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries. | [The `ty` modules] | [src/librustc/ty/context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs) + +[The HIR]: hir.html +[The parser]: the-parser.html +[The Rustc Driver]: rustc-driver.html +[The `ty` modules]: ty.html diff --git a/src/glossary.md b/src/appendix-glossary.md similarity index 94% rename from src/glossary.md rename to src/appendix-glossary.md index e542d4e35..1914adec6 100644 --- a/src/glossary.md +++ b/src/appendix-glossary.md @@ -1,21 +1,20 @@ -Glossary --------- +# Appendix C: Glossary The compiler uses a number of...idiosyncratic abbreviations and things. This glossary attempts to list them and give you a few pointers for understanding them better. Term | Meaning ------------------------|-------- AST | the abstract syntax tree produced by the syntax crate; reflects user syntax very closely. -binder | a "binder" is a place where a variable or type is declared; for example, the `` is a binder for the generic type parameter `T` in `fn foo(..)`, and \|`a`\|` ...` is a binder for the parameter `a`. See [the background chapter for more](./background.html#free-vs-bound) -bound variable | a "bound variable" is one that is declared within an expression/term. For example, the variable `a` is bound within the closure expession \|`a`\|` a * 2`. See [the background chapter for more](./background.html#free-vs-bound) +binder | a "binder" is a place where a variable or type is declared; for example, the `` is a binder for the generic type parameter `T` in `fn foo(..)`, and \|`a`\|` ...` is a binder for the parameter `a`. See [the background chapter for more](./appendix-background.html#free-vs-bound) +bound variable | a "bound variable" is one that is declared within an expression/term. For example, the variable `a` is bound within the closure expession \|`a`\|` a * 2`. See [the background chapter for more](./appendix-background.html#free-vs-bound) codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use. completeness | completeness is a technical term in type theory. Completeness means that every type-safe program also type-checks. Having both soundness and completeness is very hard, and usually soundness is more important. (see "soundness"). -control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg) +control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./appendix-background.html#cfg) cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc. DAG | a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](incremental-compilation.html)) -data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow) +data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./appendix-background.html#dataflow) DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`. -free variable | a "free variable" is one that is not bound within an expression or term; see [the background chapter for more](./background.html#free-vs-bound) +free variable | a "free variable" is one that is not bound within an expression or term; see [the background chapter for more](./appendix-background.html#free-vs-bound) 'gcx | the lifetime of the global arena ([see more](ty.html)) generics | the set of generic type parameters defined on a type or item HIR | the High-level IR, created by lowering and desugaring the AST ([see more](hir.html)) @@ -39,7 +38,7 @@ obligation | something that must be proven by the trait system ([s projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](./traits-goals-and-clauses.html#trait-ref) promoted constants | constants extracted from a function and lifted to static scope; see [this section](./mir.html#promoted) for more details. provider | the function that executes a query ([see more](query.html)) -quantified | in math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see [the background chapter for more](./background.html#quantified) +quantified | in math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see [the background chapter for more](./appendix-background.html#quantified) query | perhaps some sub-computation during compilation ([see more](query.html)) region | another term for "lifetime" often used in the literature and in the borrow checker. sess | the compiler session, which stores global data used throughout compilation @@ -57,7 +56,7 @@ token | the smallest unit of parsing. Tokens are produced aft trans | the code to translate MIR into LLVM IR. trait reference | a trait and values for its type parameters ([see more](ty.html)). ty | the internal representation of a type ([see more](ty.html)). -variance | variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if `T` is a subtype of `U`, then `Vec` is a subtype `Vec` because `Vec` is *covariant* in its generic parameter. See [the background chapter for more](./background.html#variance). +variance | variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if `T` is a subtype of `U`, then `Vec` is a subtype `Vec` because `Vec` is *covariant* in its generic parameter. See [the background chapter for more](./appendix-background.html#variance). [LLVM]: https://llvm.org/ [lto]: https://llvm.org/docs/LinkTimeOptimization.html diff --git a/src/appendix-stupid-stats.md b/src/appendix-stupid-stats.md new file mode 100644 index 000000000..20d5aaf9b --- /dev/null +++ b/src/appendix-stupid-stats.md @@ -0,0 +1,405 @@ +# Appendix A: A tutorial on creating a drop-in replacement for rustc + +> **Note:** This is a copy of `@nrc`'s amazing [stupid-stats]. You should find +> a copy of the code on the GitHub repository although due to the compiler's +> constantly evolving nature, there is no guarantee it'll compile on the first +> go. + +Many tools benefit from being a drop-in replacement for a compiler. By this, I +mean that any user of the tool can use `mytool` in all the ways they would +normally use `rustc` - whether manually compiling a single file or as part of a +complex make project or Cargo build, etc. That could be a lot of work; +rustc, like most compilers, takes a large number of command line arguments which +can affect compilation in complex and interacting ways. Emulating all of this +behaviour in your tool is annoying at best, especically if you are making many +of the same calls into librustc that the compiler is. + +The kind of things I have in mind are tools like rustdoc or a future rustfmt. +These want to operate as closely as possible to real compilation, but have +totally different outputs (documentation and formatted source code, +respectively). Another use case is a customised compiler. Say you want to add a +custom code generation phase after macro expansion, then creating a new tool +should be easier than forking the compiler (and keeping it up to date as the +compiler evolves). + +I have gradually been trying to improve the API of librustc to make creating a +drop-in tool easier to produce (many others have also helped improve these +interfaces over the same time frame). It is now pretty simple to make a tool +which is as close to rustc as you want it to be. In this tutorial I'll show +how. + +Note/warning, everything I talk about in this tutorial is internal API for +rustc. It is all extremely unstable and likely to change often and in +unpredictable ways. Maintaining a tool which uses these APIs will be non- +trivial, although hopefully easier than maintaining one that does similar things +without using them. + +This tutorial starts with a very high level view of the rustc compilation +process and of some of the code that drives compilation. Then I'll describe how +that process can be customised. In the final section of the tutorial, I'll go +through an example - stupid-stats - which shows how to build a drop-in tool. + + +## Overview of the compilation process + +Compilation using rustc happens in several phases. We start with parsing, this +includes lexing. The output of this phase is an AST (abstract syntax tree). +There is a single AST for each crate (indeed, the entire compilation process +operates over a single crate). Parsing abstracts away details about individual +files which will all have been read in to the AST in this phase. At this stage +the AST includes all macro uses, attributes will still be present, and nothing +will have been eliminated due to `cfg`s. + +The next phase is configuration and macro expansion. This can be thought of as a +function over the AST. The unexpanded AST goes in and an expanded AST comes out. +Macros and syntax extensions are expanded, and `cfg` attributes will cause some +code to disappear. The resulting AST won't have any macros or macro uses left +in. + +The code for these first two phases is in [libsyntax](https://github.com/rust-lang/rust/tree/master/src/libsyntax). + +After this phase, the compiler allocates ids to each node in the AST +(technically not every node, but most of them). If we are writing out +dependencies, that happens now. + +The next big phase is analysis. This is the most complex phase and +uses the bulk of the code in rustc. This includes name resolution, type +checking, borrow checking, type and lifetime inference, trait selection, method +selection, linting, and so forth. Most error detection is done in this phase +(although parse errors are found during parsing). The 'output' of this phase is +a bunch of side tables containing semantic information about the source program. +The analysis code is in [librustc](https://github.com/rust-lang/rust/tree/master/src/librustc) +and a bunch of other crates with the 'librustc_' prefix. + +Next is translation, this translates the AST (and all those side tables) into +LLVM IR (intermediate representation). We do this by calling into the LLVM +libraries, rather than actually writing IR directly to a file. The code for this is in +[librustc_trans](https://github.com/rust-lang/rust/tree/master/src/librustc_trans). + +The next phase is running the LLVM backend. This runs LLVM's optimisation passes +on the generated IR and then generates machine code. The result is object files. +This phase is all done by LLVM, it is not really part of the rust compiler. The +interface between LLVM and rustc is in [librustc_llvm](https://github.com/rust-lang/rust/tree/master/src/librustc_llvm). + +Finally, we link the object files into an executable. Again we outsource this to +other programs and it's not really part of the rust compiler. The interface is +in [librustc_back](https://github.com/rust-lang/rust/tree/master/src/librustc_back) +(which also contains some things used primarily during translation). + +All these phases are coordinated by the driver. To see the exact sequence, look +at the `compile_input` function in [librustc_driver/driver.rs](https://github.com/rust-lang/rust/tree/master/src/librustc_driver/driver.rs). +The driver (which is found in [librust_driver](https://github.com/rust-lang/rust/tree/master/src/librustc_driver)) +handles all the highest level coordination of compilation - handling command +line arguments, maintaining compilation state (primarily in the `Session`), and +calling the appropriate code to run each phase of compilation. It also handles +high level coordination of pretty printing and testing. To create a drop-in +compiler replacement or a compiler replacement, we leave most of compilation +alone and customise the driver using its APIs. + + +## The driver customisation APIs + +There are two primary ways to customise compilation - high level control of the +driver using `CompilerCalls` and controlling each phase of compilation using a +`CompileController`. The former lets you customise handling of command line +arguments etc., the latter lets you stop compilation early or execute code +between phases. + + +### `CompilerCalls` + +`CompilerCalls` is a trait that you implement in your tool. It contains a fairly +ad-hoc set of methods to hook in to the process of processing command line +arguments and driving the compiler. For details, see the comments in +[librustc_driver/lib.rs](https://github.com/rust-lang/rust/tree/master/src/librustc_driver/lib.rs). +I'll summarise the methods here. + +`early_callback` and `late_callback` let you call arbitrary code at different +points - early is after command line arguments have been parsed, but before +anything is done with them; late is pretty much the last thing before +compilation starts, i.e., after all processing of command line arguments, etc. is +done. Currently, you get to choose whether compilation stops or continues at +each point, but you don't get to change anything the driver has done. You can +record some info for later, or perform other actions of your own. + +`some_input` and `no_input` give you an opportunity to modify the primary input +to the compiler (usually the input is a file containing the top module for a +crate, but it could also be a string). You could record the input or perform +other actions of your own. + +Ignore `parse_pretty`, it is unfortunate and hopefully will get improved. There +is a default implementation, so you can pretend it doesn't exist. + +`build_controller` returns a `CompileController` object for more fine-grained +control of compilation, it is described next. + +We might add more options in the future. + + +### `CompilerController` + +`CompilerController` is a struct consisting of `PhaseController`s and flags. +Currently, there is only flag, `make_glob_map` which signals whether to produce +a map of glob imports (used by save-analysis and potentially other tools). There +are probably flags in the session that should be moved here. + +There is a `PhaseController` for each of the phases described in the above +summary of compilation (and we could add more in the future for finer-grained +control). They are all `after_` a phase because they are checked at the end of a +phase (again, that might change), e.g., `CompilerController::after_parse` +controls what happens immediately after parsing (and before macro expansion). + +Each `PhaseController` contains a flag called `stop` which indicates whether +compilation should stop or continue, and a callback to be executed at the point +indicated by the phase. The callback is called whether or not compilation +continues. + +Information about the state of compilation is passed to these callbacks in a +`CompileState` object. This contains all the information the compiler has. Note +that this state information is immutable - your callback can only execute code +using the compiler state, it can't modify the state. (If there is demand, we +could change that). The state available to a callback depends on where during +compilation the callback is called. For example, after parsing there is an AST +but no semantic analysis (because the AST has not been analysed yet). After +translation, there is translation info, but no AST or analysis info (since these +have been consumed/forgotten). + + +## An example - stupid-stats + +Our example tool is very simple, it simply collects some simple and not very +useful statistics about a program; it is called stupid-stats. You can find +the (more heavily commented) complete source for the example on [Github](https://github.com/nick29581/stupid-stats/blob/master/src). +To build, just do `cargo build`. To run on a file `foo.rs`, do `cargo run +foo.rs` (assuming you have a Rust program called `foo.rs`. You can also pass any +command line arguments that you would normally pass to rustc). When you run it +you'll see output similar to + +``` +In crate: foo, + +Found 12 uses of `println!`; +The most common number of arguments is 1 (67% of all functions); +25% of functions have four or more arguments. +``` + +To make things easier, when we talk about functions, we're excluding methods and +closures. + +You can also use the executable as a drop-in replacement for rustc, because +after all, that is the whole point of this exercise. So, however you use rustc +in your makefile setup, you can use `target/stupid` (or whatever executable you +end up with) instead. That might mean setting an environment variable or it +might mean renaming your executable to `rustc` and setting your PATH. Similarly, +if you're using Cargo, you'll need to rename the executable to rustc and set the +PATH. Alternatively, you should be able to use +[multirust](https://github.com/brson/multirust) to get around all the PATH stuff +(although I haven't actually tried that). + +(Note that this example prints to stdout. I'm not entirely sure what Cargo does +with stdout from rustc under different circumstances. If you don't see any +output, try inserting a `panic!` after the `println!`s to error out, then Cargo +should dump stupid-stats' stdout to Cargo's stdout). + +Let's start with the `main` function for our tool, it is pretty simple: + +``` +fn main() { + let args: Vec<_> = std::env::args().collect(); + rustc_driver::run_compiler(&args, &mut StupidCalls::new()); + std::env::set_exit_status(0); +} +``` + +The first line grabs any command line arguments. The second line calls the +compiler driver with those arguments. The final line sets the exit code for the +program. + +The only interesting thing is the `StupidCalls` object we pass to the driver. +This is our implementation of the `CompilerCalls` trait and is what will make +this tool different from rustc. + +`StupidCalls` is a mostly empty struct: + +``` +struct StupidCalls { + default_calls: RustcDefaultCalls, +} +``` + +This tool is so simple that it doesn't need to store any data here, but usually +you would. We embed a `RustcDefaultCalls` object to delegate to in our impl when +we want exactly the same behaviour as the Rust compiler. Mostly you don't want +to do that (or at least don't need to) in a tool. However, Cargo calls rustc +with the `--print file-names`, so we delegate in `late_callback` and `no_input` +to keep Cargo happy. + +Most of the rest of the impl of `CompilerCalls` is trivial: + +``` +impl<'a> CompilerCalls<'a> for StupidCalls { + fn early_callback(&mut self, + _: &getopts::Matches, + _: &config::Options, + _: &diagnostics::registry::Registry, + _: ErrorOutputType) + -> Compilation { + Compilation::Continue + } + + fn late_callback(&mut self, + t: &TransCrate, + m: &getopts::Matches, + s: &Session, + c: &CrateStore, + i: &Input, + odir: &Option, + ofile: &Option) + -> Compilation { + self.default_calls.late_callback(t, m, s, c, i, odir, ofile); + Compilation::Continue + } + + fn some_input(&mut self, + input: Input, + input_path: Option) + -> (Input, Option) { + (input, input_path) + } + + fn no_input(&mut self, + m: &getopts::Matches, + o: &config::Options, + odir: &Option, + ofile: &Option, + r: &diagnostics::registry::Registry) + -> Option<(Input, Option)> { + self.default_calls.no_input(m, o, odir, ofile, r); + + // This is not optimal error handling. + panic!("No input supplied to stupid-stats"); + } + + fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> { + ... + } +} +``` + +We don't do anything for either of the callbacks, nor do we change the input if +the user supplies it. If they don't, we just `panic!`, this is the simplest way +to handle the error, but not very user-friendly, a real tool would give a +constructive message or perform a default action. + +In `build_controller` we construct our `CompileController`. We only want to +parse, and we want to inspect macros before expansion, so we make compilation +stop after the first phase (parsing). The callback after that phase is where the +tool does it's actual work by walking the AST. We do that by creating an AST +visitor and making it walk the AST from the top (the crate root). Once we've +walked the crate, we print the stats we've collected: + +``` +fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> { + // We mostly want to do what rustc does, which is what basic() will return. + let mut control = driver::CompileController::basic(); + // But we only need the AST, so we can stop compilation after parsing. + control.after_parse.stop = Compilation::Stop; + + // And when we stop after parsing we'll call this closure. + // Note that this will give us an AST before macro expansions, which is + // not usually what you want. + control.after_parse.callback = box |state| { + // Which extracts information about the compiled crate... + let krate = state.krate.unwrap(); + + // ...and walks the AST, collecting stats. + let mut visitor = StupidVisitor::new(); + visit::walk_crate(&mut visitor, krate); + + // And finally prints out the stupid stats that we collected. + let cratename = match attr::find_crate_name(&krate.attrs[]) { + Some(name) => name.to_string(), + None => String::from_str("unknown_crate"), + }; + println!("In crate: {},\n", cratename); + println!("Found {} uses of `println!`;", visitor.println_count); + + let (common, common_percent, four_percent) = visitor.compute_arg_stats(); + println!("The most common number of arguments is {} ({:.0}% of all functions);", + common, common_percent); + println!("{:.0}% of functions have four or more arguments.", four_percent); + }; + + control +} +``` + +That is all it takes to create your own drop-in compiler replacement or custom +compiler! For the sake of completeness I'll go over the rest of the stupid-stats +tool. + +``` +struct StupidVisitor { + println_count: usize, + arg_counts: Vec, +} +``` + +The `StupidVisitor` struct just keeps track of the number of `println!`s it has +seen and the count for each number of arguments. It implements +`syntax::visit::Visitor` to walk the AST. Mostly we just use the default +methods, these walk the AST taking no action. We override `visit_item` and +`visit_mac` to implement custom behaviour when we walk into items (items include +functions, modules, traits, structs, and so forth, we're only interested in +functions) and macros: + +``` +impl<'v> visit::Visitor<'v> for StupidVisitor { + fn visit_item(&mut self, i: &'v ast::Item) { + match i.node { + ast::Item_::ItemFn(ref decl, _, _, _, _) => { + // Record the number of args. + self.increment_args(decl.inputs.len()); + } + _ => {} + } + + // Keep walking. + visit::walk_item(self, i) + } + + fn visit_mac(&mut self, mac: &'v ast::Mac) { + // Find its name and check if it is "println". + let ast::Mac_::MacInvocTT(ref path, _, _) = mac.node; + if path_to_string(path) == "println" { + self.println_count += 1; + } + + // Keep walking. + visit::walk_mac(self, mac) + } +} +``` + +The `increment_args` method increments the correct count in +`StupidVisitor::arg_counts`. After we're done walking, `compute_arg_stats` does +some pretty basic maths to come up with the stats we want about arguments. + + +## What next? + +These APIs are pretty new and have a long way to go until they're really good. +If there are improvements you'd like to see or things you'd like to be able to +do, let me know in a comment or [GitHub issue](https://github.com/rust-lang/rust/issues). +In particular, it's not clear to me exactly what extra flexibility is required. +If you have an existing tool that would be suited to this setup, please try it +out and let me know if you have problems. + +It'd be great to see Rustdoc converted to using these APIs, if that is possible +(although long term, I'd prefer to see Rustdoc run on the output from save- +analysis, rather than doing its own analysis). Other parts of the compiler +(e.g., pretty printing, testing) could be refactored to use these APIs +internally (I already changed save-analysis to use `CompilerController`). I've +been experimenting with a prototype rustfmt which also uses these APIs. + +[stupid-stats]: https://github.com/nrc/stupid-stats \ No newline at end of file diff --git a/src/code-index.md b/src/code-index.md deleted file mode 100644 index 6a500abba..000000000 --- a/src/code-index.md +++ /dev/null @@ -1,13 +0,0 @@ -# Code Index - -rustc has a lot of important data structures. This is an attempt to give some -guidance on where to learn more about some of the key data structures of the -compiler. - -Item | Kind | Short description | Chapter | Declaration -----------------|----------|-----------------------------|--------------------|------------------- -`CodeMap` | struct | The CodeMap maps the AST nodes to their source code | [The parser](the-parser.html) | [src/libsyntax/codemap.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs) -`ParseSess` | struct | This struct contains information about a parsing session | [The parser](the-parser.html) | [src/libsyntax/parse/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/mod.rs) -`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser](the-parser.html) | [src/libsyntax/parse/lexer/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/lexer/mod.rs) -`TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules](ty.html) | [src/librustc/ty/trait_def.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/trait_def.rs) -`TyCtxt<'cx, 'tcx, 'tcx>` | type | The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries. | [The `ty` modules](ty.html) | [src/librustc/ty/context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs) diff --git a/src/mir-borrowck.md b/src/mir-borrowck.md index 3c10191d4..6c4c99d61 100644 --- a/src/mir-borrowck.md +++ b/src/mir-borrowck.md @@ -42,9 +42,9 @@ The overall flow of the borrow checker is as follows: include references to the new regions that we are computing. - We then invoke `nll::replace_regions_in_mir` to modify this copy C. Among other things, this function will replace all of the regions in - the MIR with fresh [inference variables](glossary.html). + the MIR with fresh [inference variables](./appendix-glossary.html). - (More details can be found in [the regionck section](./mir-regionck.html).) -- Next, we perform a number of [dataflow analyses](./background.html#dataflow) +- Next, we perform a number of [dataflow analyses](./appendix-background.html#dataflow) that compute what data is moved and when. The results of these analyses are needed to do both borrow checking and region inference. - Using the move data, we can then compute the values of all the regions in the MIR. diff --git a/src/mir-regionck.md b/src/mir-regionck.md index e7b12405a..dbf740ea8 100644 --- a/src/mir-regionck.md +++ b/src/mir-regionck.md @@ -35,7 +35,7 @@ The MIR-based region analysis consists of two major functions: - More details to come, though the [NLL RFC] also includes fairly thorough (and hopefully readable) coverage. -[fvb]: background.html#free-vs-bound +[fvb]: appendix-background.html#free-vs-bound [NLL RFC]: http://rust-lang.github.io/rfcs/2094-nll.html ## Universal regions @@ -129,7 +129,7 @@ are going to wind up with a subtyping relationship like this one: We handle this sort of subtyping by taking the variables that are bound in the supertype and **skolemizing** them: this means that we replace them with -[universally quantified](background.html#quantified) +[universally quantified](appendix-background.html#quantified) representatives, written like `!1`. We call these regions "skolemized regions" -- they represent, basically, "some unknown region". @@ -144,7 +144,7 @@ what we wanted. So let's work through what happens next. To check if two functions are subtypes, we check if their arguments have the desired relationship -(fn arguments are [contravariant](./background.html#variance), so +(fn arguments are [contravariant](./appendix-background.html#variance), so we swap the left and right here): &'!1 u32 <: &'static u32 @@ -181,7 +181,7 @@ Here, the root universe would consist of the lifetimes `'static` and the same concept to types, in which case the types `Foo` and `T` would be in the root universe (along with other global types, like `i32`). Basically, the root universe contains all the names that -[appear free](./background.html#free-vs-bound) in the body of `bar`. +[appear free](./appendix-background.html#free-vs-bound) in the body of `bar`. Now let's extend `bar` a bit by adding a variable `x`: diff --git a/src/mir.md b/src/mir.md index 6e7ac0691..688a8750c 100644 --- a/src/mir.md +++ b/src/mir.md @@ -26,7 +26,7 @@ Some of the key characteristics of MIR are: - It does not have nested expressions. - All types in MIR are fully explicit. -[cfg]: ./background.html#cfg +[cfg]: ./appendix-background.html#cfg ## Key MIR vocabulary @@ -239,4 +239,4 @@ but [you can read about those below](#promoted)). [mir]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir [mirmanip]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir [mir]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir -[newtype'd]: glossary.html +[newtype'd]: appendix-glossary.html diff --git a/src/rustc-driver.md b/src/rustc-driver.md new file mode 100644 index 000000000..23a036e73 --- /dev/null +++ b/src/rustc-driver.md @@ -0,0 +1,76 @@ +# The Rustc Driver + +The [`rustc_driver`] is essentially `rustc`'s `main()` function. It acts as +the glue for running the various phases of the compiler in the correct order, +managing state such as the [`CodeMap`] \(maps AST nodes to source code), +[`Session`] \(general build context and error messaging) and the [`TyCtxt`] +\(the "typing context", allowing you to query the type system and other cool +stuff). The `rustc_driver` crate also provides external users with a method +for running code at particular times during the compilation process, allowing +third parties to effectively use `rustc`'s internals as a library for +analysing a crate or emulating the compiler in-process (e.g. the RLS). + +For those using `rustc` as a library, the `run_compiler()` function is the main +entrypoint to the compiler. Its main parameters are a list of command-line +arguments and a reference to something which implements the `CompilerCalls` +trait. A `CompilerCalls` creates the overall `CompileController`, letting it +govern which compiler passes are run and attach callbacks to be fired at the end +of each phase. + +From `rustc_driver`'s perspective, the main phases of the compiler are: + +1. *Parse Input:* Initial crate parsing +2. *Configure and Expand:* Resolve `#[cfg]` attributes, name resolution, and + expand macros +3. *Run Analysis Passes:* Run trait resolution, typechecking, region checking + and other miscellaneous analysis passes on the crate +4. *Translate to LLVM:* Translate to the in-memory form of LLVM IR and turn it + into an executable/object files + +The `CompileController` then gives users the ability to inspect the ongoing +compilation process + +- after parsing +- after AST expansion +- after HIR lowering +- after analysis, and +- when compilation is done + +The `CompileState`'s various `state_after_*()` constructors can be inspected to +determine what bits of information are available to which callback. + +For a more detailed explanation on using `rustc_driver`, check out the +[stupid-stats] guide by `@nrc` (attached as [Appendix A]). + +> **Warning:** By its very nature, the internal compiler APIs are always going +> to be unstable. That said, we do try not to break things unnecessarily. + +## A Note On Lifetimes + +The Rust compiler is a fairly large program containing lots of big data +structures (e.g. the AST, HIR, and the type system) and as such, arenas and +references are heavily relied upon to minimize unnecessary memory use. This +manifests itself in the way people can plug into the compiler, preferring a +"push"-style API (callbacks) instead of the more Rust-ic "pull" style (think +the `Iterator` trait). + +For example the [`CompileState`], the state passed to callbacks after each +phase, is essentially just a box of optional references to pieces inside the +compiler. The lifetime bound on the `CompilerCalls` trait then helps to ensure +compiler internals don't "escape" the compiler (e.g. if you tried to keep a +reference to the AST after the compiler is finished), while still letting users +record *some* state for use after the `run_compiler()` function finishes. + +Thread-local storage and interning are used a lot through the compiler to reduce +duplication while also preventing a lot of the ergonomic issues due to many +pervasive lifetimes. The `rustc::ty::tls` module is used to access these +thread-locals, although you should rarely need to touch it. + + +[`rustc_driver`]: https://github.com/rust-lang/rust/tree/master/src/librustc_driver +[`CompileState`]: https://github.com/rust-lang/rust/blob/master/src/librustc_driver/driver.rs +[`Session`]: https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs +[`TyCtxt`]: https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs +[`CodeMap`]: https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs +[stupid-stats]: https://github.com/nrc/stupid-stats +[Appendix A]: appendix-stupid-stats.html \ No newline at end of file diff --git a/src/traits-canonicalization.md b/src/traits-canonicalization.md index fc55fac0d..6ff61fdda 100644 --- a/src/traits-canonicalization.md +++ b/src/traits-canonicalization.md @@ -41,7 +41,7 @@ trait query: `?A: Foo<'static, ?B>`, where `?A` and `?B` are unbound. This query contains two unbound variables, but it also contains the lifetime `'static`. The trait system generally ignores all lifetimes and treats them equally, so when canonicalizing, we will *also* -replace any [free lifetime](./background.html#free-vs-bound) with a +replace any [free lifetime](./appendix-background.html#free-vs-bound) with a canonical variable. Therefore, we get the following result: ?0: Foo<'?1, ?2>