From 83d5f64c01958e88117366cae9303505a90e81d9 Mon Sep 17 00:00:00 2001 From: Michael Bryan Date: Wed, 7 Mar 2018 22:18:05 +0800 Subject: [PATCH 1/8] Added a very rough rustc-driver chapter --- src/SUMMARY.md | 1 + src/rustc-driver.md | 66 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+) create mode 100644 src/rustc-driver.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index d4ee59abf..841a9b491 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -9,6 +9,7 @@ - [Using `compiletest` + commands to control test execution](./compiletest.md) - [Walkthrough: a typical contribution](./walkthrough.md) - [High-level overview of the compiler source](./high-level-overview.md) +- [The Rustc Driver](./rustc-driver.md) - [Queries: demand-driven compilation](./query.md) - [Incremental compilation](./incremental-compilation.md) - [The parser](./the-parser.md) diff --git a/src/rustc-driver.md b/src/rustc-driver.md new file mode 100644 index 000000000..b21cc3662 --- /dev/null +++ b/src/rustc-driver.md @@ -0,0 +1,66 @@ +# The Rustc Driver + +The [`rustc_driver`] is essentially `rustc`'s `main()` function. It acts as +the glue for running the various phases of the compiler in the correct order, +managing state such as the [`CodeMap`] \(maps AST nodes to source code), +[`Session`] \(general build context and error messaging) and the [`TyCtxt`] +\(the "typing context", allowing you to query the type system and other cool +stuff). The `rustc_driver` crate also provides external users with a method +for running code at particular times during the compilation process, allowing +third parties to effectively use `rustc`'s internals as a library for +analysing a crate. + +For those using `rustc` as a library, the `run_compiler()` function is the main +entrypoint to the compiler. Its main parameters are a list of command-line +arguments and a reference to something which implements the `CompilerCalls` +trait. A `CompilerCalls` creates the overall `CompileController`, letting it +govern which compiler passes are run and attach callbacks to be fired at the end +of each phase. + +From `rustc_driver`'s perspective, the main phases of the compiler are: + +1. *Parse Input:* Initial crate parsing +2. *Configure and Expand:* Resolve `#[cfg]` attributes and expand macros +3. *Run Analysis Passes:* Run the resolution, typechecking, region checking + and other miscellaneous analysis passes on the crate +4. *Translate to LLVM:* Turn the analysed program into executable code + +The `CompileController` then gives users the ability to inspect the ongoing +compilation process + +- after parsing +- after AST expansion +- after HIR lowering +- after analysis, and +- when compilation is done + +The `CompileState`'s various `state_after_*()` constructors can be inspected to +determine what bits of information are available to which callback. + +## A Note On Lifetimes + +The Rust compiler is a fairly large program containing lots of big data +structures (e.g. the AST, HIR, and the type system) and as such, arenas and +references are heavily relied upon to minimize unnecessary memory use. This +manifests itself in the way people can plug into the compiler, preferring a +"push"-style API (callbacks) instead of the more Rust-ic "pull" style (think +the `Iterator` trait). + +For example the [`CompileState`], the state passed to callbacks after each +phase, is essentially just a box of optional references to pieces inside the +compiler. The lifetime bound on the `CompilerCalls` trait then helps to ensure +compiler internals don't "escape" the compiler (e.g. if you tried to keep a +reference to the AST after the compiler is finished), while still letting users +record *some* state for use after the `run_compiler()` function finishes. + +Thread-local storage and interning are used a lot through the compiler to reduce +duplication while also preventing a lot of the ergonomic issues due to many +pervasive lifetimes. The `rustc::ty::tls` module is used to access these +thread-locals, although you should rarely need to touch it. + + +[`rustc_driver`]: https://github.com/rust-lang/rust/tree/master/src/librustc_driver +[`CompileState`]: https://github.com/rust-lang/rust/blob/master/src/librustc_driver/driver.rs +[`Session`]: https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs +[`TyCtxt`]: https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs +[`CodeMap`]: https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs \ No newline at end of file From 3bb1b1ab13d4d7b8c28a2e7cb3fde67de222e4db Mon Sep 17 00:00:00 2001 From: Michael Bryan Date: Wed, 7 Mar 2018 22:31:22 +0800 Subject: [PATCH 2/8] Renamed appendices and added @nrc's guide --- src/SUMMARY.md | 10 +- src/{background.md => appendix-background.md} | 2 +- src/{code-index.md => appendix-code-index.md} | 2 +- src/{glossary.md => appendix-glossary.md} | 17 +- src/appendix-stupid-stats.md | 396 ++++++++++++++++++ src/mir-borrowck.md | 2 +- src/mir-regionck.md | 8 +- src/mir.md | 2 +- 8 files changed, 419 insertions(+), 20 deletions(-) rename src/{background.md => appendix-background.md} (99%) rename src/{code-index.md => appendix-code-index.md} (98%) rename src/{glossary.md => appendix-glossary.md} (94%) create mode 100644 src/appendix-stupid-stats.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 841a9b491..eb889d36b 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -46,6 +46,10 @@ - [miri const evaluator](./miri.md) - [Parameter Environments](./param_env.md) - [Generating LLVM IR](./trans.md) -- [Background material](./background.md) -- [Glossary](./glossary.md) -- [Code Index](./code-index.md) + +--- + +- [Appendix A: Stupid Stats](./appendix-stupid-stats.md) +- [Appendix B: Background material](./appendix-background.md) +- [Appendix C: Glossary](./appendix-glossary.md) +- [Appendix D: Code Index](./appendix-code-index.md) diff --git a/src/background.md b/src/appendix-background.md similarity index 99% rename from src/background.md rename to src/appendix-background.md index 50c247774..c69e7d93d 100644 --- a/src/background.md +++ b/src/appendix-background.md @@ -1,4 +1,4 @@ -# Background topics +# Appendix B: Background topics This section covers a numbers of common compiler terms that arise in this guide. We try to give the general definition while providing some diff --git a/src/code-index.md b/src/appendix-code-index.md similarity index 98% rename from src/code-index.md rename to src/appendix-code-index.md index 6a500abba..64a40f74d 100644 --- a/src/code-index.md +++ b/src/appendix-code-index.md @@ -1,4 +1,4 @@ -# Code Index +# Appendix D: Code Index rustc has a lot of important data structures. This is an attempt to give some guidance on where to learn more about some of the key data structures of the diff --git a/src/glossary.md b/src/appendix-glossary.md similarity index 94% rename from src/glossary.md rename to src/appendix-glossary.md index e542d4e35..1914adec6 100644 --- a/src/glossary.md +++ b/src/appendix-glossary.md @@ -1,21 +1,20 @@ -Glossary --------- +# Appendix C: Glossary The compiler uses a number of...idiosyncratic abbreviations and things. This glossary attempts to list them and give you a few pointers for understanding them better. Term | Meaning ------------------------|-------- AST | the abstract syntax tree produced by the syntax crate; reflects user syntax very closely. -binder | a "binder" is a place where a variable or type is declared; for example, the `` is a binder for the generic type parameter `T` in `fn foo(..)`, and \|`a`\|` ...` is a binder for the parameter `a`. See [the background chapter for more](./background.html#free-vs-bound) -bound variable | a "bound variable" is one that is declared within an expression/term. For example, the variable `a` is bound within the closure expession \|`a`\|` a * 2`. See [the background chapter for more](./background.html#free-vs-bound) +binder | a "binder" is a place where a variable or type is declared; for example, the `` is a binder for the generic type parameter `T` in `fn foo(..)`, and \|`a`\|` ...` is a binder for the parameter `a`. See [the background chapter for more](./appendix-background.html#free-vs-bound) +bound variable | a "bound variable" is one that is declared within an expression/term. For example, the variable `a` is bound within the closure expession \|`a`\|` a * 2`. See [the background chapter for more](./appendix-background.html#free-vs-bound) codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use. completeness | completeness is a technical term in type theory. Completeness means that every type-safe program also type-checks. Having both soundness and completeness is very hard, and usually soundness is more important. (see "soundness"). -control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg) +control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./appendix-background.html#cfg) cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc. DAG | a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](incremental-compilation.html)) -data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow) +data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./appendix-background.html#dataflow) DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`. -free variable | a "free variable" is one that is not bound within an expression or term; see [the background chapter for more](./background.html#free-vs-bound) +free variable | a "free variable" is one that is not bound within an expression or term; see [the background chapter for more](./appendix-background.html#free-vs-bound) 'gcx | the lifetime of the global arena ([see more](ty.html)) generics | the set of generic type parameters defined on a type or item HIR | the High-level IR, created by lowering and desugaring the AST ([see more](hir.html)) @@ -39,7 +38,7 @@ obligation | something that must be proven by the trait system ([s projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](./traits-goals-and-clauses.html#trait-ref) promoted constants | constants extracted from a function and lifted to static scope; see [this section](./mir.html#promoted) for more details. provider | the function that executes a query ([see more](query.html)) -quantified | in math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see [the background chapter for more](./background.html#quantified) +quantified | in math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see [the background chapter for more](./appendix-background.html#quantified) query | perhaps some sub-computation during compilation ([see more](query.html)) region | another term for "lifetime" often used in the literature and in the borrow checker. sess | the compiler session, which stores global data used throughout compilation @@ -57,7 +56,7 @@ token | the smallest unit of parsing. Tokens are produced aft trans | the code to translate MIR into LLVM IR. trait reference | a trait and values for its type parameters ([see more](ty.html)). ty | the internal representation of a type ([see more](ty.html)). -variance | variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if `T` is a subtype of `U`, then `Vec` is a subtype `Vec` because `Vec` is *covariant* in its generic parameter. See [the background chapter for more](./background.html#variance). +variance | variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if `T` is a subtype of `U`, then `Vec` is a subtype `Vec` because `Vec` is *covariant* in its generic parameter. See [the background chapter for more](./appendix-background.html#variance). [LLVM]: https://llvm.org/ [lto]: https://llvm.org/docs/LinkTimeOptimization.html diff --git a/src/appendix-stupid-stats.md b/src/appendix-stupid-stats.md new file mode 100644 index 000000000..405577e3c --- /dev/null +++ b/src/appendix-stupid-stats.md @@ -0,0 +1,396 @@ +# Appendix A: A tutorial on creating a drop-in replacement for rustc + +Many tools benefit from being a drop-in replacement for a compiler. By this, I +mean that any user of the tool can use `mytool` in all the ways they would +normally use `rustc` - whether manually compiling a single file or as part of a +complex make project or Cargo build, etc. That could be a lot of work; +rustc, like most compilers, takes a large number of command line arguments which +can affect compilation in complex and interacting ways. Emulating all of this +behaviour in your tool is annoying at best, especically if you are making many +of the same calls into librustc that the compiler is. + +The kind of things I have in mind are tools like rustdoc or a future rustfmt. +These want to operate as closely as possible to real compilation, but have +totally different outputs (documentation and formatted source code, +respectively). Another use case is a customised compiler. Say you want to add a +custom code generation phase after macro expansion, then creating a new tool +should be easier than forking the compiler (and keeping it up to date as the +compiler evolves). + +I have gradually been trying to improve the API of librustc to make creating a +drop-in tool easier to produce (many others have also helped improve these +interfaces over the same time frame). It is now pretty simple to make a tool +which is as close to rustc as you want it to be. In this tutorial I'll show +how. + +Note/warning, everything I talk about in this tutorial is internal API for +rustc. It is all extremely unstable and likely to change often and in +unpredictable ways. Maintaining a tool which uses these APIs will be non- +trivial, although hopefully easier than maintaining one that does similar things +without using them. + +This tutorial starts with a very high level view of the rustc compilation +process and of some of the code that drives compilation. Then I'll describe how +that process can be customised. In the final section of the tutorial, I'll go +through an example - stupid-stats - which shows how to build a drop-in tool. + + +## Overview of the compilation process + +Compilation using rustc happens in several phases. We start with parsing, this +includes lexing. The output of this phase is an AST (abstract syntax tree). +There is a single AST for each crate (indeed, the entire compilation process +operates over a single crate). Parsing abstracts away details about individual +files which will all have been read in to the AST in this phase. At this stage +the AST includes all macro uses, attributes will still be present, and nothing +will have been eliminated due to `cfg`s. + +The next phase is configuration and macro expansion. This can be thought of as a +function over the AST. The unexpanded AST goes in and an expanded AST comes out. +Macros and syntax extensions are expanded, and `cfg` attributes will cause some +code to disappear. The resulting AST won't have any macros or macro uses left +in. + +The code for these first two phases is in [libsyntax](https://github.com/rust-lang/rust/tree/master/src/libsyntax). + +After this phase, the compiler allocates ids to each node in the AST +(technically not every node, but most of them). If we are writing out +dependencies, that happens now. + +The next big phase is analysis. This is the most complex phase and +uses the bulk of the code in rustc. This includes name resolution, type +checking, borrow checking, type and lifetime inference, trait selection, method +selection, linting, and so forth. Most error detection is done in this phase +(although parse errors are found during parsing). The 'output' of this phase is +a bunch of side tables containing semantic information about the source program. +The analysis code is in [librustc](https://github.com/rust-lang/rust/tree/master/src/librustc) +and a bunch of other crates with the 'librustc_' prefix. + +Next is translation, this translates the AST (and all those side tables) into +LLVM IR (intermediate representation). We do this by calling into the LLVM +libraries, rather than actually writing IR directly to a file. The code for this is in +[librustc_trans](https://github.com/rust-lang/rust/tree/master/src/librustc_trans). + +The next phase is running the LLVM backend. This runs LLVM's optimisation passes +on the generated IR and then generates machine code. The result is object files. +This phase is all done by LLVM, it is not really part of the rust compiler. The +interface between LLVM and rustc is in [librustc_llvm](https://github.com/rust-lang/rust/tree/master/src/librustc_llvm). + +Finally, we link the object files into an executable. Again we outsource this to +other programs and it's not really part of the rust compiler. The interface is +in [librustc_back](https://github.com/rust-lang/rust/tree/master/src/librustc_back) +(which also contains some things used primarily during translation). + +All these phases are coordinated by the driver. To see the exact sequence, look +at the `compile_input` function in [librustc_driver/driver.rs](https://github.com/rust-lang/rust/tree/master/src/librustc_driver/driver.rs). +The driver (which is found in [librust_driver](https://github.com/rust-lang/rust/tree/master/src/librustc_driver)) +handles all the highest level coordination of compilation - handling command +line arguments, maintaining compilation state (primarily in the `Session`), and +calling the appropriate code to run each phase of compilation. It also handles +high level coordination of pretty printing and testing. To create a drop-in +compiler replacement or a compiler replacement, we leave most of compilation +alone and customise the driver using its APIs. + + +## The driver customisation APIs + +There are two primary ways to customise compilation - high level control of the +driver using `CompilerCalls` and controlling each phase of compilation using a +`CompileController`. The former lets you customise handling of command line +arguments etc., the latter lets you stop compilation early or execute code +between phases. + + +### `CompilerCalls` + +`CompilerCalls` is a trait that you implement in your tool. It contains a fairly +ad-hoc set of methods to hook in to the process of processing command line +arguments and driving the compiler. For details, see the comments in +[librustc_driver/lib.rs](https://github.com/rust-lang/rust/tree/master/src/librustc_driver/lib.rs). +I'll summarise the methods here. + +`early_callback` and `late_callback` let you call arbitrary code at different +points - early is after command line arguments have been parsed, but before +anything is done with them; late is pretty much the last thing before +compilation starts, i.e., after all processing of command line arguments, etc. is +done. Currently, you get to choose whether compilation stops or continues at +each point, but you don't get to change anything the driver has done. You can +record some info for later, or perform other actions of your own. + +`some_input` and `no_input` give you an opportunity to modify the primary input +to the compiler (usually the input is a file containing the top module for a +crate, but it could also be a string). You could record the input or perform +other actions of your own. + +Ignore `parse_pretty`, it is unfortunate and hopefully will get improved. There +is a default implementation, so you can pretend it doesn't exist. + +`build_controller` returns a `CompileController` object for more fine-grained +control of compilation, it is described next. + +We might add more options in the future. + + +### `CompilerController` + +`CompilerController` is a struct consisting of `PhaseController`s and flags. +Currently, there is only flag, `make_glob_map` which signals whether to produce +a map of glob imports (used by save-analysis and potentially other tools). There +are probably flags in the session that should be moved here. + +There is a `PhaseController` for each of the phases described in the above +summary of compilation (and we could add more in the future for finer-grained +control). They are all `after_` a phase because they are checked at the end of a +phase (again, that might change), e.g., `CompilerController::after_parse` +controls what happens immediately after parsing (and before macro expansion). + +Each `PhaseController` contains a flag called `stop` which indicates whether +compilation should stop or continue, and a callback to be executed at the point +indicated by the phase. The callback is called whether or not compilation +continues. + +Information about the state of compilation is passed to these callbacks in a +`CompileState` object. This contains all the information the compiler has. Note +that this state information is immutable - your callback can only execute code +using the compiler state, it can't modify the state. (If there is demand, we +could change that). The state available to a callback depends on where during +compilation the callback is called. For example, after parsing there is an AST +but no semantic analysis (because the AST has not been analysed yet). After +translation, there is translation info, but no AST or analysis info (since these +have been consumed/forgotten). + + +## An example - stupid-stats + +Our example tool is very simple, it simply collects some simple and not very +useful statistics about a program; it is called stupid-stats. You can find +the (more heavily commented) complete source for the example on [Github](https://github.com/nick29581/stupid-stats/blob/master/src). +To build, just do `cargo build`. To run on a file `foo.rs`, do `cargo run +foo.rs` (assuming you have a Rust program called `foo.rs`. You can also pass any +command line arguments that you would normally pass to rustc). When you run it +you'll see output similar to + +``` +In crate: foo, + +Found 12 uses of `println!`; +The most common number of arguments is 1 (67% of all functions); +25% of functions have four or more arguments. +``` + +To make things easier, when we talk about functions, we're excluding methods and +closures. + +You can also use the executable as a drop-in replacement for rustc, because +after all, that is the whole point of this exercise. So, however you use rustc +in your makefile setup, you can use `target/stupid` (or whatever executable you +end up with) instead. That might mean setting an environment variable or it +might mean renaming your executable to `rustc` and setting your PATH. Similarly, +if you're using Cargo, you'll need to rename the executable to rustc and set the +PATH. Alternatively, you should be able to use +[multirust](https://github.com/brson/multirust) to get around all the PATH stuff +(although I haven't actually tried that). + +(Note that this example prints to stdout. I'm not entirely sure what Cargo does +with stdout from rustc under different circumstances. If you don't see any +output, try inserting a `panic!` after the `println!`s to error out, then Cargo +should dump stupid-stats' stdout to Cargo's stdout). + +Let's start with the `main` function for our tool, it is pretty simple: + +``` +fn main() { + let args: Vec<_> = std::env::args().collect(); + rustc_driver::run_compiler(&args, &mut StupidCalls::new()); + std::env::set_exit_status(0); +} +``` + +The first line grabs any command line arguments. The second line calls the +compiler driver with those arguments. The final line sets the exit code for the +program. + +The only interesting thing is the `StupidCalls` object we pass to the driver. +This is our implementation of the `CompilerCalls` trait and is what will make +this tool different from rustc. + +`StupidCalls` is a mostly empty struct: + +``` +struct StupidCalls { + default_calls: RustcDefaultCalls, +} +``` + +This tool is so simple that it doesn't need to store any data here, but usually +you would. We embed a `RustcDefaultCalls` object to delegate to in our impl when +we want exactly the same behaviour as the Rust compiler. Mostly you don't want +to do that (or at least don't need to) in a tool. However, Cargo calls rustc +with the `--print file-names`, so we delegate in `late_callback` and `no_input` +to keep Cargo happy. + +Most of the rest of the impl of `CompilerCalls` is trivial: + +``` +impl<'a> CompilerCalls<'a> for StupidCalls { + fn early_callback(&mut self, + _: &getopts::Matches, + _: &config::Options, + _: &diagnostics::registry::Registry, + _: ErrorOutputType) + -> Compilation { + Compilation::Continue + } + + fn late_callback(&mut self, + m: &getopts::Matches, + s: &Session, + i: &Input, + odir: &Option, + ofile: &Option) + -> Compilation { + self.default_calls.late_callback(m, s, i, odir, ofile); + Compilation::Continue + } + + fn some_input(&mut self, + input: Input, + input_path: Option) + -> (Input, Option) { + (input, input_path) + } + + fn no_input(&mut self, + m: &getopts::Matches, + o: &config::Options, + odir: &Option, + ofile: &Option, + r: &diagnostics::registry::Registry) + -> Option<(Input, Option)> { + self.default_calls.no_input(m, o, odir, ofile, r); + + // This is not optimal error handling. + panic!("No input supplied to stupid-stats"); + } + + fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> { + ... + } +} +``` + +We don't do anything for either of the callbacks, nor do we change the input if +the user supplies it. If they don't, we just `panic!`, this is the simplest way +to handle the error, but not very user-friendly, a real tool would give a +constructive message or perform a default action. + +In `build_controller` we construct our `CompileController`. We only want to +parse, and we want to inspect macros before expansion, so we make compilation +stop after the first phase (parsing). The callback after that phase is where the +tool does it's actual work by walking the AST. We do that by creating an AST +visitor and making it walk the AST from the top (the crate root). Once we've +walked the crate, we print the stats we've collected: + +``` +fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> { + // We mostly want to do what rustc does, which is what basic() will return. + let mut control = driver::CompileController::basic(); + // But we only need the AST, so we can stop compilation after parsing. + control.after_parse.stop = Compilation::Stop; + + // And when we stop after parsing we'll call this closure. + // Note that this will give us an AST before macro expansions, which is + // not usually what you want. + control.after_parse.callback = box |state| { + // Which extracts information about the compiled crate... + let krate = state.krate.unwrap(); + + // ...and walks the AST, collecting stats. + let mut visitor = StupidVisitor::new(); + visit::walk_crate(&mut visitor, krate); + + // And finally prints out the stupid stats that we collected. + let cratename = match attr::find_crate_name(&krate.attrs[]) { + Some(name) => name.to_string(), + None => String::from_str("unknown_crate"), + }; + println!("In crate: {},\n", cratename); + println!("Found {} uses of `println!`;", visitor.println_count); + + let (common, common_percent, four_percent) = visitor.compute_arg_stats(); + println!("The most common number of arguments is {} ({:.0}% of all functions);", + common, common_percent); + println!("{:.0}% of functions have four or more arguments.", four_percent); + }; + + control +} +``` + +That is all it takes to create your own drop-in compiler replacement or custom +compiler! For the sake of completeness I'll go over the rest of the stupid-stats +tool. + +``` +struct StupidVisitor { + println_count: usize, + arg_counts: Vec, +} +``` + +The `StupidVisitor` struct just keeps track of the number of `println!`s it has +seen and the count for each number of arguments. It implements +`syntax::visit::Visitor` to walk the AST. Mostly we just use the default +methods, these walk the AST taking no action. We override `visit_item` and +`visit_mac` to implement custom behaviour when we walk into items (items include +functions, modules, traits, structs, and so forth, we're only interested in +functions) and macros: + +``` +impl<'v> visit::Visitor<'v> for StupidVisitor { + fn visit_item(&mut self, i: &'v ast::Item) { + match i.node { + ast::Item_::ItemFn(ref decl, _, _, _, _) => { + // Record the number of args. + self.increment_args(decl.inputs.len()); + } + _ => {} + } + + // Keep walking. + visit::walk_item(self, i) + } + + fn visit_mac(&mut self, mac: &'v ast::Mac) { + // Find its name and check if it is "println". + let ast::Mac_::MacInvocTT(ref path, _, _) = mac.node; + if path_to_string(path) == "println" { + self.println_count += 1; + } + + // Keep walking. + visit::walk_mac(self, mac) + } +} +``` + +The `increment_args` method increments the correct count in +`StupidVisitor::arg_counts`. After we're done walking, `compute_arg_stats` does +some pretty basic maths to come up with the stats we want about arguments. + + +## What next? + +These APIs are pretty new and have a long way to go until they're really good. +If there are improvements you'd like to see or things you'd like to be able to +do, let me know in a comment or [GitHub issue](https://github.com/rust-lang/rust/issues). +In particular, it's not clear to me exactly what extra flexibility is required. +If you have an existing tool that would be suited to this setup, please try it +out and let me know if you have problems. + +It'd be great to see Rustdoc converted to using these APIs, if that is possible +(although long term, I'd prefer to see Rustdoc run on the output from save- +analysis, rather than doing its own analysis). Other parts of the compiler +(e.g., pretty printing, testing) could be refactored to use these APIs +internally (I already changed save-analysis to use `CompilerController`). I've +been experimenting with a prototype rustfmt which also uses these APIs. diff --git a/src/mir-borrowck.md b/src/mir-borrowck.md index 3c10191d4..ab99ac9dc 100644 --- a/src/mir-borrowck.md +++ b/src/mir-borrowck.md @@ -44,7 +44,7 @@ The overall flow of the borrow checker is as follows: Among other things, this function will replace all of the regions in the MIR with fresh [inference variables](glossary.html). - (More details can be found in [the regionck section](./mir-regionck.html).) -- Next, we perform a number of [dataflow analyses](./background.html#dataflow) +- Next, we perform a number of [dataflow analyses](./appendix-background.html#dataflow) that compute what data is moved and when. The results of these analyses are needed to do both borrow checking and region inference. - Using the move data, we can then compute the values of all the regions in the MIR. diff --git a/src/mir-regionck.md b/src/mir-regionck.md index e7b12405a..dbf740ea8 100644 --- a/src/mir-regionck.md +++ b/src/mir-regionck.md @@ -35,7 +35,7 @@ The MIR-based region analysis consists of two major functions: - More details to come, though the [NLL RFC] also includes fairly thorough (and hopefully readable) coverage. -[fvb]: background.html#free-vs-bound +[fvb]: appendix-background.html#free-vs-bound [NLL RFC]: http://rust-lang.github.io/rfcs/2094-nll.html ## Universal regions @@ -129,7 +129,7 @@ are going to wind up with a subtyping relationship like this one: We handle this sort of subtyping by taking the variables that are bound in the supertype and **skolemizing** them: this means that we replace them with -[universally quantified](background.html#quantified) +[universally quantified](appendix-background.html#quantified) representatives, written like `!1`. We call these regions "skolemized regions" -- they represent, basically, "some unknown region". @@ -144,7 +144,7 @@ what we wanted. So let's work through what happens next. To check if two functions are subtypes, we check if their arguments have the desired relationship -(fn arguments are [contravariant](./background.html#variance), so +(fn arguments are [contravariant](./appendix-background.html#variance), so we swap the left and right here): &'!1 u32 <: &'static u32 @@ -181,7 +181,7 @@ Here, the root universe would consist of the lifetimes `'static` and the same concept to types, in which case the types `Foo` and `T` would be in the root universe (along with other global types, like `i32`). Basically, the root universe contains all the names that -[appear free](./background.html#free-vs-bound) in the body of `bar`. +[appear free](./appendix-background.html#free-vs-bound) in the body of `bar`. Now let's extend `bar` a bit by adding a variable `x`: diff --git a/src/mir.md b/src/mir.md index 6e7ac0691..5c4b16310 100644 --- a/src/mir.md +++ b/src/mir.md @@ -26,7 +26,7 @@ Some of the key characteristics of MIR are: - It does not have nested expressions. - All types in MIR are fully explicit. -[cfg]: ./background.html#cfg +[cfg]: ./appendix-background.html#cfg ## Key MIR vocabulary From 1d80615686713b3716880c7e7399d77ddff239de Mon Sep 17 00:00:00 2001 From: Michael Bryan Date: Wed, 7 Mar 2018 22:32:45 +0800 Subject: [PATCH 3/8] Thank you link checker! --- src/mir-borrowck.md | 2 +- src/mir.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mir-borrowck.md b/src/mir-borrowck.md index ab99ac9dc..6c4c99d61 100644 --- a/src/mir-borrowck.md +++ b/src/mir-borrowck.md @@ -42,7 +42,7 @@ The overall flow of the borrow checker is as follows: include references to the new regions that we are computing. - We then invoke `nll::replace_regions_in_mir` to modify this copy C. Among other things, this function will replace all of the regions in - the MIR with fresh [inference variables](glossary.html). + the MIR with fresh [inference variables](./appendix-glossary.html). - (More details can be found in [the regionck section](./mir-regionck.html).) - Next, we perform a number of [dataflow analyses](./appendix-background.html#dataflow) that compute what data is moved and when. The results of these analyses diff --git a/src/mir.md b/src/mir.md index 5c4b16310..688a8750c 100644 --- a/src/mir.md +++ b/src/mir.md @@ -239,4 +239,4 @@ but [you can read about those below](#promoted)). [mir]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir [mirmanip]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir [mir]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir -[newtype'd]: glossary.html +[newtype'd]: appendix-glossary.html From 42e4eee299287944aa6fc5516954783b59ac572b Mon Sep 17 00:00:00 2001 From: Michael Bryan Date: Wed, 7 Mar 2018 22:42:27 +0800 Subject: [PATCH 4/8] Added a couple definitions to the code index --- src/appendix-code-index.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/appendix-code-index.md b/src/appendix-code-index.md index 64a40f74d..2097d61ae 100644 --- a/src/appendix-code-index.md +++ b/src/appendix-code-index.md @@ -7,7 +7,11 @@ compiler. Item | Kind | Short description | Chapter | Declaration ----------------|----------|-----------------------------|--------------------|------------------- `CodeMap` | struct | The CodeMap maps the AST nodes to their source code | [The parser](the-parser.html) | [src/libsyntax/codemap.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs) +`CompileState` | struct | State that is passed to a callback at each compiler pass | [The Rustc Driver](rustc-driver.html) | [src/librustc_driver/driver.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_driver/driver.rs) +`ast::Crate` | struct | Syntax-level representation of a parsed crate | | [src/librustc/hir/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/ast.rs) +`hir::Crate` | struct | Top-level data structure representing the crate being compiled | | [src/librustc/hir/mod.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/hir/mod.rs) `ParseSess` | struct | This struct contains information about a parsing session | [The parser](the-parser.html) | [src/libsyntax/parse/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/mod.rs) `StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser](the-parser.html) | [src/libsyntax/parse/lexer/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/lexer/mod.rs) +`Session` | struct | The data associated with a compilation session | [the Parser](the-parser.html), [The Rustc Driver](rustc-driver.html) | [src/librustc/session/mod.html](https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs) `TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules](ty.html) | [src/librustc/ty/trait_def.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/trait_def.rs) `TyCtxt<'cx, 'tcx, 'tcx>` | type | The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries. | [The `ty` modules](ty.html) | [src/librustc/ty/context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs) From e3d005699a69990dac084dbde8ccf7c5ed82a4ad Mon Sep 17 00:00:00 2001 From: Michael Bryan Date: Mon, 12 Mar 2018 18:51:08 +0800 Subject: [PATCH 5/8] Addressed some of @nrc and @mark-i-m's comments --- src/appendix-code-index.md | 23 ++++++++++++++--------- src/rustc-driver.md | 13 +++++++++---- 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/src/appendix-code-index.md b/src/appendix-code-index.md index 2097d61ae..f6dcb9c37 100644 --- a/src/appendix-code-index.md +++ b/src/appendix-code-index.md @@ -6,12 +6,17 @@ compiler. Item | Kind | Short description | Chapter | Declaration ----------------|----------|-----------------------------|--------------------|------------------- -`CodeMap` | struct | The CodeMap maps the AST nodes to their source code | [The parser](the-parser.html) | [src/libsyntax/codemap.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs) -`CompileState` | struct | State that is passed to a callback at each compiler pass | [The Rustc Driver](rustc-driver.html) | [src/librustc_driver/driver.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_driver/driver.rs) -`ast::Crate` | struct | Syntax-level representation of a parsed crate | | [src/librustc/hir/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/ast.rs) -`hir::Crate` | struct | Top-level data structure representing the crate being compiled | | [src/librustc/hir/mod.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/hir/mod.rs) -`ParseSess` | struct | This struct contains information about a parsing session | [The parser](the-parser.html) | [src/libsyntax/parse/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/mod.rs) -`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser](the-parser.html) | [src/libsyntax/parse/lexer/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/lexer/mod.rs) -`Session` | struct | The data associated with a compilation session | [the Parser](the-parser.html), [The Rustc Driver](rustc-driver.html) | [src/librustc/session/mod.html](https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs) -`TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules](ty.html) | [src/librustc/ty/trait_def.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/trait_def.rs) -`TyCtxt<'cx, 'tcx, 'tcx>` | type | The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries. | [The `ty` modules](ty.html) | [src/librustc/ty/context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs) +`CodeMap` | struct | The CodeMap maps the AST nodes to their source code | [The parser] | [src/libsyntax/codemap.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs) +`CompileState` | struct | State that is passed to a callback at each compiler pass | [The Rustc Driver] | [src/librustc_driver/driver.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_driver/driver.rs) +`ast::Crate` | struct | Syntax-level representation of a parsed crate | [The parser] | [src/librustc/hir/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/ast.rs) +`hir::Crate` | struct | More abstract, compiler-friendly form of a crate's AST | [The Hir] | [src/librustc/hir/mod.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/hir/mod.rs) +`ParseSess` | struct | This struct contains information about a parsing session | [the Parser] | [src/libsyntax/parse/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/mod.rs) +`Session` | struct | The data associated with a compilation session | [the Parser], [The Rustc Driver] | [src/librustc/session/mod.html](https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs) +`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser] | [src/libsyntax/parse/lexer/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/lexer/mod.rs) +`TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules] | [src/librustc/ty/trait_def.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/trait_def.rs) +`TyCtxt<'cx, 'tcx, 'tcx>` | type | The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries. | [The `ty` modules] | [src/librustc/ty/context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs) + +[The HIR]: hir.html +[The parser]: the-parser.html +[The Rustc Driver]: rustc-driver.html +[The `ty` modules]: ty.html diff --git a/src/rustc-driver.md b/src/rustc-driver.md index b21cc3662..7e6eb7e2e 100644 --- a/src/rustc-driver.md +++ b/src/rustc-driver.md @@ -8,7 +8,7 @@ managing state such as the [`CodeMap`] \(maps AST nodes to source code), stuff). The `rustc_driver` crate also provides external users with a method for running code at particular times during the compilation process, allowing third parties to effectively use `rustc`'s internals as a library for -analysing a crate. +analysing a crate or emulating the compiler in-process (e.g. the RLS). For those using `rustc` as a library, the `run_compiler()` function is the main entrypoint to the compiler. Its main parameters are a list of command-line @@ -20,10 +20,12 @@ of each phase. From `rustc_driver`'s perspective, the main phases of the compiler are: 1. *Parse Input:* Initial crate parsing -2. *Configure and Expand:* Resolve `#[cfg]` attributes and expand macros -3. *Run Analysis Passes:* Run the resolution, typechecking, region checking +2. *Configure and Expand:* Resolve `#[cfg]` attributes, name resolution, and + expand macros +3. *Run Analysis Passes:* Run trait resolution, typechecking, region checking and other miscellaneous analysis passes on the crate -4. *Translate to LLVM:* Turn the analysed program into executable code +4. *Translate to LLVM:* Translate to the in-memory form of LLVM IR and turn it + into an executable/object files The `CompileController` then gives users the ability to inspect the ongoing compilation process @@ -37,6 +39,9 @@ compilation process The `CompileState`'s various `state_after_*()` constructors can be inspected to determine what bits of information are available to which callback. +> **Warning:** By its very nature, the internal compiler APIs are always going +> to be unstable. That said, we do try not to break things unnecessarily. + ## A Note On Lifetimes The Rust compiler is a fairly large program containing lots of big data From aecac1c0e6c5634a615906d489f8e04aa351ec4b Mon Sep 17 00:00:00 2001 From: Michael Bryan Date: Mon, 12 Mar 2018 19:04:18 +0800 Subject: [PATCH 6/8] Updated stupid-stats cc: nrc/stupid-stats#8 --- src/appendix-stupid-stats.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/appendix-stupid-stats.md b/src/appendix-stupid-stats.md index 405577e3c..b1c9d2141 100644 --- a/src/appendix-stupid-stats.md +++ b/src/appendix-stupid-stats.md @@ -243,13 +243,15 @@ impl<'a> CompilerCalls<'a> for StupidCalls { } fn late_callback(&mut self, + t: &TransCrate, m: &getopts::Matches, s: &Session, + c: &CrateStore, i: &Input, - odir: &Option, - ofile: &Option) + odir: &Option, + ofile: &Option) -> Compilation { - self.default_calls.late_callback(m, s, i, odir, ofile); + self.default_calls.late_callback(t, m, s, c, i, odir, ofile); Compilation::Continue } @@ -393,4 +395,4 @@ It'd be great to see Rustdoc converted to using these APIs, if that is possible analysis, rather than doing its own analysis). Other parts of the compiler (e.g., pretty printing, testing) could be refactored to use these APIs internally (I already changed save-analysis to use `CompilerController`). I've -been experimenting with a prototype rustfmt which also uses these APIs. +been experimenting with a prototype rustfmt which also uses these APIs. \ No newline at end of file From 5686d8c85984c393b41654e5d44d334088dff391 Mon Sep 17 00:00:00 2001 From: Michael Bryan Date: Mon, 12 Mar 2018 19:15:06 +0800 Subject: [PATCH 7/8] Fixed a broken link --- src/traits-canonicalization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/traits-canonicalization.md b/src/traits-canonicalization.md index fc55fac0d..6ff61fdda 100644 --- a/src/traits-canonicalization.md +++ b/src/traits-canonicalization.md @@ -41,7 +41,7 @@ trait query: `?A: Foo<'static, ?B>`, where `?A` and `?B` are unbound. This query contains two unbound variables, but it also contains the lifetime `'static`. The trait system generally ignores all lifetimes and treats them equally, so when canonicalizing, we will *also* -replace any [free lifetime](./background.html#free-vs-bound) with a +replace any [free lifetime](./appendix-background.html#free-vs-bound) with a canonical variable. Therefore, we get the following result: ?0: Foo<'?1, ?2> From e6946bcd604bfea98631f14938a582cd94f20208 Mon Sep 17 00:00:00 2001 From: Michael Bryan Date: Mon, 12 Mar 2018 19:15:26 +0800 Subject: [PATCH 8/8] Added links back to nrc's stupid-stats --- src/appendix-stupid-stats.md | 9 ++++++++- src/rustc-driver.md | 7 ++++++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/appendix-stupid-stats.md b/src/appendix-stupid-stats.md index b1c9d2141..20d5aaf9b 100644 --- a/src/appendix-stupid-stats.md +++ b/src/appendix-stupid-stats.md @@ -1,5 +1,10 @@ # Appendix A: A tutorial on creating a drop-in replacement for rustc +> **Note:** This is a copy of `@nrc`'s amazing [stupid-stats]. You should find +> a copy of the code on the GitHub repository although due to the compiler's +> constantly evolving nature, there is no guarantee it'll compile on the first +> go. + Many tools benefit from being a drop-in replacement for a compiler. By this, I mean that any user of the tool can use `mytool` in all the ways they would normally use `rustc` - whether manually compiling a single file or as part of a @@ -395,4 +400,6 @@ It'd be great to see Rustdoc converted to using these APIs, if that is possible analysis, rather than doing its own analysis). Other parts of the compiler (e.g., pretty printing, testing) could be refactored to use these APIs internally (I already changed save-analysis to use `CompilerController`). I've -been experimenting with a prototype rustfmt which also uses these APIs. \ No newline at end of file +been experimenting with a prototype rustfmt which also uses these APIs. + +[stupid-stats]: https://github.com/nrc/stupid-stats \ No newline at end of file diff --git a/src/rustc-driver.md b/src/rustc-driver.md index 7e6eb7e2e..23a036e73 100644 --- a/src/rustc-driver.md +++ b/src/rustc-driver.md @@ -39,6 +39,9 @@ compilation process The `CompileState`'s various `state_after_*()` constructors can be inspected to determine what bits of information are available to which callback. +For a more detailed explanation on using `rustc_driver`, check out the +[stupid-stats] guide by `@nrc` (attached as [Appendix A]). + > **Warning:** By its very nature, the internal compiler APIs are always going > to be unstable. That said, we do try not to break things unnecessarily. @@ -68,4 +71,6 @@ thread-locals, although you should rarely need to touch it. [`CompileState`]: https://github.com/rust-lang/rust/blob/master/src/librustc_driver/driver.rs [`Session`]: https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs [`TyCtxt`]: https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs -[`CodeMap`]: https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs \ No newline at end of file +[`CodeMap`]: https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs +[stupid-stats]: https://github.com/nrc/stupid-stats +[Appendix A]: appendix-stupid-stats.html \ No newline at end of file