diff --git a/src/SUMMARY.md b/src/SUMMARY.md index fe040d9e2..6e0c71735 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -51,10 +51,10 @@ - [Ex: Type checking through `rustc_interface`](./rustc-driver-interacting-with-the-ast.md) - [Syntax and the AST](./syntax-intro.md) - [Lexing and Parsing](./the-parser.md) - - [`#[test]` Implementation](./test-implementation.md) - - [Panic Implementation](./panic-implementation.md) - [Macro expansion](./macro-expansion.md) - [Name resolution](./name-resolution.md) + - [`#[test]` Implementation](./test-implementation.md) + - [Panic Implementation](./panic-implementation.md) - [AST Validation](./ast-validation.md) - [Feature Gate Checking](./feature-gate-ck.md) - [The HIR (High-level IR)](./hir.md) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 279598270..7961d0cf1 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -3,147 +3,179 @@ > `librustc_ast`, `librustc_expand`, and `librustc_builtin_macros` are all undergoing > refactoring, so some of the links in this chapter may be broken. -Macro expansion happens during parsing. `rustc` has two parsers, in fact: the -normal Rust parser, and the macro parser. During the parsing phase, the normal -Rust parser will set aside the contents of macros and their invocations. Later, -before name resolution, macros are expanded using these portions of the code. -The macro parser, in turn, may call the normal Rust parser when it needs to -bind a metavariable (e.g. `$my_expr`) while parsing the contents of a macro -invocation. The code for macro expansion is in -[`src/librustc_expand/mbe/`][code_dir]. This chapter aims to explain how macro -expansion works. - -### Example - -It's helpful to have an example to refer to. For the remainder of this chapter, -whenever we refer to the "example _definition_", we mean the following: +Rust has a very powerful macro system. In the previous chapter, we saw how the +parser sets aside macros to be expanded (it temporarily uses [placeholders]). +This chapter is about the process of expanding those macros iteratively until +we have a complete AST for our crate with no unexpanded macros (or a compile +error). + +[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html + +First, we will discuss the algorithm that expands and integrates macro output +into ASTs. Next, we will take a look at how hygiene data is collected. Finally, +we will look at the specifics of expanding different types of macros. + +Many of the algorithms and data structures described below are in [`rustc_expand`], +with basic data structures in [`rustc_expand::base`][base]. + +Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are +handled in [`rustc_expand::config`][cfg]. + +[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html +[base]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/index.html +[cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/config/index.html + +## Expansion and AST Integration + +First of all, expansion happens at the crate level. Given a raw source code for +a crate, the compiler will produce a massive AST with all macros expanded, all +modules inlined, etc. The primary entry point for this process is the +[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we +use this method on the whole crate (see ["Eager Expansion"](#eager-expansion) +below for more detailed discussion of edge case expansion issues). + +[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html +[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html + +At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a +queue of unresolved macro invocations (that is, macros we haven't found the +definition of yet). We repeatedly try to pick a macro from the queue, resolve +it, expand it, and integrate it back. If we can't make progress in an +iteration, this represents a compile error. Here is the [algorithm][original]: + +[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment +[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 + +0. Initialize an `queue` of unresolved macros. +1. Repeat until `queue` is empty (or we make no progress, which is an error): + 0. [Resolve](./name-resolution.md) imports in our partially built crate as + much as possible. + 1. Collect as many macro [`Invocation`s][inv] as possible from our + partially built crate (fn-like, attributes, derives) and add them to the + queue. + 2. Dequeue the first element, and attempt to resolve it. + 3. If it's resolved: + 0. Run the macro's expander function that consumes a [`TokenStream`] or + AST and produces a [`TokenStream`] or [`AstFragment`] (depending on + the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt], + each of which are a token (punctuation, identifier, or literal) or a + delimited group (anything inside `()`/`[]`/`{}`)). + - At this point, we know everything about the macro itself and can + call `set_expn_data` to fill in its properties in the global data; + that is the hygiene data associated with `ExpnId`. (See [the + "Hygiene" section below][hybelow]). + 1. Integrate that piece of AST into the big existing partially built + AST. This is essentially where the "token-like mass" becomes a + proper set-in-stone AST with side-tables. It happens as follows: + - If the macro produces tokens (e.g. a proc macro), we parse into + an AST, which may produce parse errors. + - During expansion, we create `SyntaxContext`s (hierarchy 2). (See + [the "Hygiene" section below][hybelow]) + - These three passes happen one after another on every AST fragment + freshly expanded from a macro: + - [`NodeId`]s are assigned by [`InvocationCollector`]. This + also collects new macro calls from this new AST piece and + adds them to the queue. + - ["Def paths"][defpath] are created and [`DefId`]s are + assigned to them by [`DefCollector`]. + - Names are put into modules (from the resolver's point of + view) by [`BuildReducedGraphVisitor`]. + 2. After expanding a single macro and integrating its output, continue + to the next iteration of [`fully_expand_fragment`][fef]. + 4. If it's not resolved: + 0. Put the macro back in the queue + 1. Continue to next iteration... + +[defpath]: https://rustc-dev-guide.rust-lang.org/hir.html?highlight=def,path#identifiers-in-the-hir +[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html +[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html +[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html +[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html +[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html +[hybelow]: #hygiene-and-hierarchies +[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html +[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html +[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html +[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html + +### Error Recovery + +If we make no progress in an iteration, then we have reached a compilation +error (e.g. an undefined macro). We attempt to recover from failures +(unresolved macros or imports) for the sake of diagnostics. This allows +compilation to continue past the first error, so that we can report more errors +at a time. Recovery can't cause compilation to suceed. We know that it will +fail at this point. The recovery happens by expanding unresolved macros into +[`ExprKind::Err`][err]. + +[err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err + +### Name Resolution + +Notice that name resolution is involved here: we need to resolve imports and +macro names in the above algorithm. This is done in +[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates +those resolutions, and reports various errors (e.g. "not found" or "found, but +it's unstable" or "expected x, found y"). However, we don't try to resolve +other names yet. This happens later, as we will see in the [next +chapter](./name-resolution.md). + +[mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html + +### Eager Expansion + +_Eager expansion_ means that we expand the arguments of a macro invocation +before the macro invocation itself. This is implemented only for a few special +built-in macros that expect literals; expanding arguments first for some of +these macro results in a smoother user experience. As an example, consider the +following: ```rust,ignore -macro_rules! printer { - (print $mvar:ident) => { - println!("{}", $mvar); - }; - (print twice $mvar:ident) => { - println!("{}", $mvar); - println!("{}", $mvar); - }; -} -``` - -`$mvar` is called a _metavariable_. Unlike normal variables, rather than -binding to a value in a computation, a metavariable binds _at compile time_ to -a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an -identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other -special tokens, such as `EOF`, which indicates that there are no more tokens. -Token trees resulting from paired parentheses-like characters (`(`...`)`, -`[`...`]`, and `{`...`}`) – they include the open and close and all the tokens -in between (we do require that parentheses-like characters be balanced). Having -macro expansion operate on token streams rather than the raw bytes of a source -file abstracts away a lot of complexity. The macro expander (and much of the -rest of the compiler) doesn't really care that much about the exact line and -column of some syntactic construct in the code; it cares about what constructs -are used in the code. Using tokens allows us to care about _what_ without -worrying about _where_. For more information about tokens, see the -[Parsing][parsing] chapter of this book. - -Whenever we refer to the "example _invocation_", we mean the following snippet: - -```rust,ignore -printer!(print foo); // Assume `foo` is a variable defined somewhere else... -``` - -The process of expanding the macro invocation into the syntax tree -`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is -called _macro expansion_, and it is the topic of this chapter. - -### The macro parser - -There are two parts to macro expansion: parsing the definition and parsing the -invocations. Interestingly, both are done by the macro parser. - -Basically, the macro parser is like an NFA-based regex parser. It uses an -algorithm similar in spirit to the [Earley parsing -algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is -defined in [`src/librustc_expand/mbe/macro_parser.rs`][code_mp]. - -The interface of the macro parser is as follows (this is slightly simplified): +macro bar($i: ident) { $i } +macro foo($i: ident) { $i } -```rust,ignore -fn parse_tt( - parser: &mut Cow, - ms: &[TokenTree], -) -> NamedParseResult +foo!(bar!(baz)); ``` -We use these items in macro parser: - -- `sess` is a "parsing session", which keeps track of some metadata. Most - notably, this is used to keep track of errors that are generated so they can - be reported to the user. -- `tts` is a stream of tokens. The macro parser's job is to consume the raw - stream of tokens and output a binding of metavariables to corresponding token - trees. -- `ms` a _matcher_. This is a sequence of token trees that we want to match - `tts` against. - -In the analogy of a regex parser, `tts` is the input and we are matching it -against the pattern `ms`. Using our examples, `tts` could be the stream of -tokens containing the inside of the example invocation `print foo`, while `ms` -might be the sequence of token (trees) `print $mvar:ident`. - -The output of the parser is a `NamedParseResult`, which indicates which of -three cases has occurred: - -- Success: `tts` matches the given matcher `ms`, and we have produced a binding - from metavariables to the corresponding token trees. -- Failure: `tts` does not match `ms`. This results in an error message such as - "No rule expected token _blah_". -- Error: some fatal error has occurred _in the parser_. For example, this - happens if there are more than one pattern match, since that indicates - the macro is ambiguous. - -The full interface is defined [here][code_parse_int]. - -The macro parser does pretty much exactly the same as a normal regex parser with -one exception: in order to parse different types of metavariables, such as -`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the -normal Rust parser. - -As mentioned above, both definitions and invocations of macros are parsed using -the macro parser. This is extremely non-intuitive and self-referential. The code -to parse macro _definitions_ is in -[`src/librustc_expand/mbe/macro_rules.rs`][code_mr]. It defines the pattern for -matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words, -a `macro_rules` definition should have in its body at least one occurrence of a -token tree followed by `=>` followed by another token tree. When the compiler -comes to a `macro_rules` definition, it uses this pattern to match the two token -trees per rule in the definition of the macro _using the macro parser itself_. -In our example definition, the metavariable `$lhs` would match the patterns of -both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` -would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{ -println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this -knowledge around for when it needs to expand a macro invocation. - -When the compiler comes to a macro invocation, it parses that invocation using -the same NFA-based macro parser that is described above. However, the matcher -used is the first token tree (`$lhs`) extracted from the arms of the macro -_definition_. Using our example, we would try to match the token stream `print -foo` from the invocation against the matchers `print $mvar:ident` and `print -twice $mvar:ident` that we previously extracted from the definition. The -algorithm is exactly the same, but when the macro parser comes to a place in the -current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), -it calls back to the normal Rust parser to get the contents of that -non-terminal. In this case, the Rust parser would look for an `ident` token, -which it finds (`foo`) and returns to the macro parser. Then, the macro parser -proceeds in parsing as normal. Also, note that exactly one of the matchers from -the various arms should match the invocation; if there is more than one match, -the parse is ambiguous, while if there are no matches at all, there is a syntax -error. - -For more information about the macro parser's implementation, see the comments -in [`src/librustc_expand/mbe/macro_parser.rs`][code_mp]. - -### Hygiene +A lazy expansion would expand `foo!` first. An eager expansion would expand +`bar!` first. + +Eager expansion is not a generally available feature of Rust. Implementing +eager expansion more generally would be challenging, but we implement it for a +few special built-in macros for the sake of user experience. The built-in +macros are implemented in [`rustc_builtin_macros`], along with some other early +code generation facilities like injection of standard library imports or +generation of test harness. There are some additional helpers for building +their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally +performs a subset of the things that lazy (normal) expansion. It is done by +invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to +whole crate, like we normally do). + +### Other Data Structures + +Here are some other notable data structures involved in expansion and integration: +- [`Resolver`] - a trait used to break crate dependencies. This allows the + resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and + pretty much everything else depending on [`rustc_ast`]. +- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion + infrastructure in the process of its work +- [`Annotatable`] - a piece of AST that can be an attribute target, almost same + thing as AstFragment except for types and patterns that can be produced by + macros but cannot be annotated with attributes +- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a + different `AstFragment` depending on its [`AstFragmentKind`] - item, + or expression, or pattern etc. + +[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html +[`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html +[`Resolver`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.Resolver.html +[`ExtCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExtCtxt.html +[`ExpansionData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExpansionData.html +[`Annotatable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.Annotatable.html +[`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html +[`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html + +## Hygiene and Hierarchies If you have ever used C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code: @@ -190,728 +222,394 @@ a macro author may want to introduce a new name to the context where the macro was called. Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro). -In rustc, this "context" is tracked via `Span`s. - -TODO: what is call-site hygiene? what is def-site hygiene? - -TODO - -### Procedural Macros - -TODO - -### Custom Derive - -TODO - -TODO: maybe something about macros 2.0? - - [code_dir]: https://github.com/rust-lang/rust/tree/master/src/librustc_expand/mbe [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser [code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/fn.parse_tt.html [parsing]: ./the-parser.html +The context is attached to AST nodes. All AST nodes generated by macros have +context attached. Additionally, there may be other nodes that have context +attached, such as some desugared syntax (non-macro-expanded nodes are +considered to just have the "root" context, as described below). +Throughout the compiler, we use [`librustc_span::Span`s][span] to refer to code locations. +This struct also has hygiene information attached to it, as we will see later. -# Discussion about hygiene - -The rest of this chapter is a dump of a discussion between `mark-i-m` and -`petrochenkov` about Macro Expansion and Hygiene. I am pasting it here so that -it never gets lost until we can make it into a proper chapter. - -```txt -mark-i-m: @Vadim Petrochenkov Hi :wave: -I was wondering if you would have a chance sometime in the next month or so to -just have a zulip discussion where you tell us (WG-learning) everything you -know about macros/expansion/hygiene. We were thinking this could be less formal -(and less work for you) than compiler lecture series lecture... thoughts? - -mark-i-m: The goal is to fill out that long-standing gap in the rustc-dev-guide - -Vadim Petrochenkov: Ok, I'm at UTC+03:00 and generally available in the -evenings (or weekends). - -mark-i-m: @Vadim Petrochenkov Either of those works for me (your evenings are -about lunch time for me :) ) Is there a particular date that would work best -for you? - -mark-i-m: @WG-learning Does anyone else have a preferred date? - - Vadim Petrochenkov: - - Is there a particular date that would work best for you? - -Nah, not much difference. (If something changes for a specific day, I'll -notify.) - -Santiago Pastorino: week days are better, but I'd say let's wait for @Vadim -Petrochenkov to say when they are ready for it and we can set a date - -Santiago Pastorino: also, we should record this so ... I guess it doesn't -matter that much when :) - - mark-i-m: - - also, we should record this so ... I guess it doesn't matter that much when - :) - -@Santiago Pastorino My thinking was to just use zulip, so we would have the log - -mark-i-m: @Vadim Petrochenkov @WG-learning How about 2 weeks from now: July 24 -at 5pm UTC time (if I did the math right, that should be evening for Vadim) - -Amanjeev Sethi: i can try and do this but I am starting a new job that week so -cannot promise. - - Santiago Pastorino: - - Vadim Petrochenkov @WG-learning How about 2 weeks from now: July 24 at 5pm - UTC time (if I did the math right, that should be evening for Vadim) - -works perfect for me - -Santiago Pastorino: @mark-i-m I have access to the compiler calendar so I can -add something there - -Santiago Pastorino: let me know if you want to add an event to the calendar, I -can do that - -Santiago Pastorino: how long it would be? - - mark-i-m: - - let me know if you want to add an event to the calendar, I can do that - -mark-i-m: That could be good :+1: - - mark-i-m: - - how long it would be? - -Let's start with 30 minutes, and if we need to schedule another we cna - - Vadim Petrochenkov: - - 5pm UTC - -1-2 hours later would be better, 5pm UTC is not evening enough. - -Vadim Petrochenkov: How exactly do you plan the meeting to go (aka how much do -I need to prepare)? - - Santiago Pastorino: - - 5pm UTC - - 1-2 hours later would be better, 5pm UTC is not evening enough. - -Scheduled for 7pm UTC then - - Santiago Pastorino: - - How exactly do you plan the meeting to go (aka how much do I need to - prepare)? - -/cc @mark-i-m - -mark-i-m: @Vadim Petrochenkov - - How exactly do you plan the meeting to go (aka how much do I need to - prepare)? - -My hope was that this could be less formal than for a compiler lecture series, -but it would be nice if you could have in your mind a tour of the design and -the code - -That is, imagine that a new person was joining the compiler team and needed to -get up to speed about macros/expansion/hygiene. What would you tell such a -person? - -mark-i-m: @Vadim Petrochenkov Are we still on for tomorrow at 7pm UTC? - -Vadim Petrochenkov: Yes. - -Santiago Pastorino: @Vadim Petrochenkov @mark-i-m I've added an event on rust -compiler team calendar - -mark-i-m: @WG-learning @Vadim Petrochenkov Hello! - -mark-i-m: We will be starting in ~7 minutes - -mark-i-m: :wave: - -Vadim Petrochenkov: I'm here. - -mark-i-m: Cool :) - -Santiago Pastorino: hello @Vadim Petrochenkov - -mark-i-m: Shall we start? - -mark-i-m: First off, @Vadim Petrochenkov Thanks for doing this! - -Vadim Petrochenkov: Here's some preliminary data I prepared. - -Vadim Petrochenkov: Below I'll assume #62771 and #62086 has landed. - -Vadim Petrochenkov: Where to find the code: librustc_span/hygiene.rs - -structures related to hygiene and expansion that are kept in global data (can -be accessed from any Ident without any context) librustc_span/lib.rs - some -secondary methods like macro backtrace using primary methods from hygiene.rs -librustc_builtin_macros - implementations of built-in macros (including macro attributes -and derives) and some other early code generation facilities like injection of -standard library imports or generation of test harness. librustc_ast/config.rs - -implementation of cfg/cfg_attr (they treated specially from other macros), -should probably be moved into librustc_ast/ext. librustc_ast/tokenstream.rs + -librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees, -and token streams. librustc_ast/ext - various expansion-related stuff -librustc_ast/ext/base.rs - basic structures used by expansion -librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion -infrastructure code - collecting macro invocations, calling into resolve for -them, calling their expanding functions, and integrating the results back into -AST librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for -"integrating the results back into AST" basicallly, "placeholder" is a -temporary AST node replaced with macro expansion result nodes -librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros -in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably -be moved into librustc_builtin_macros these days librustc_ast/ext/proc_macro.rs + -librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the -stable proc_macro library, converting tokens and token streams between the two -representations and sending them through C ABI librustc_ast/ext/tt - -implementation of macro_rules, turns macro_rules DSL into something with -signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, -@mark-i-m knows more about this librustc_resolve/macros.rs - resolving macro -paths, validating those resolutions, reporting various "not found"/"found, but -it's unstable"/"expected x, found y" errors librustc_middle/hir/map/def_collector.rs + -librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly -expanded from a macro into various parent/child structures like module -hierarchy or "definition paths" - -Primary structures: HygieneData - global piece of data containing hygiene and -expansion info that can be accessed from any Ident without any context ExpnId - -ID of a macro call or desugaring (and also expansion of that call/desugaring, -depending on context) ExpnInfo/InternalExpnData - a subset of properties from -both macro definition and macro call available through global data -SyntaxContext - ID of a chain of nested macro definitions (identified by -ExpnIds) SyntaxContextData - data associated with the given SyntaxContext, -mostly a cache for results of filtering that chain in different ways Span - a -code location + SyntaxContext Ident - interned string (Symbol) + Span, i.e. a -string with attached hygiene data TokenStream - a collection of TokenTrees -TokenTree - a token (punctuation, identifier, or literal) or a delimited group -(anything inside ()/[]/{}) SyntaxExtension - a lowered macro representation, -contains its expander function transforming a tokenstream or AST into -tokenstream or AST + some additional data like stability, or a list of unstable -features allowed inside the macro. SyntaxExtensionKind - expander functions -may have several different signatures (take one token stream, or two, or a -piece of AST, etc), this is an enum that lists them -ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing -the expander signatures (TODO: change and rename the signatures into something -more consistent) trait Resolver - a trait used to break crate dependencies (so -resolver services can be used in librustc_ast, despite librustc_resolve and pretty -much everything else depending on librustc_ast) ExtCtxt/ExpansionData - various -intermediate data kept and used by expansion infra in the process of its work -AstFragment - a piece of AST that can be produced by a macro (may include -multiple homogeneous AST nodes, like e.g. a list of items) Annotatable - a -piece of AST that can be an attribute target, almost same thing as AstFragment -except for types and patterns that can be produced by macros but cannot be -annotated with attributes (TODO: Merge into AstFragment) trait MacResult - a -"polymorphic" AST fragment, something that can turn into a different -AstFragment depending on its context (aka AstFragmentKind - item, or -expression, or pattern etc.) Invocation/InvocationKind - a structure describing -a macro call, these structures are collected by the expansion infra -(InvocationCollector), queued, resolved, expanded when resolved, etc. - -Primary algorithms / actions: TODO - -mark-i-m: Very useful :+1: - -mark-i-m: @Vadim Petrochenkov Zulip doesn't have an indication of typing, so -I'm not sure if you are waiting for me or not - -Vadim Petrochenkov: The TODO part should be about how a crate transitions from -the state "macros exist as written in source" to "all macros are expanded", but -I didn't write it yet. - -Vadim Petrochenkov: (That should probably better happen off-line.) - -Vadim Petrochenkov: Now, if you have any questions? - -mark-i-m: Thanks :) - -mark-i-m: /me is still reading :P - -mark-i-m: Ok - -mark-i-m: So I guess my first question is about hygiene, since that remains the -most mysterious to me... My understanding is that the parser outputs AST nodes, -where each node has a Span - -mark-i-m: In the absence of macros and desugaring, what does the syntax context -of an AST node look like? - -mark-i-m: @Vadim Petrochenkov - -Vadim Petrochenkov: Not each node, but many of them. When a node is not -macro-expanded, its context is 0. - -Vadim Petrochenkov: aka SyntaxContext::empty() - -Vadim Petrochenkov: it's a chain that consists of one expansion - expansion 0 -aka ExpnId::root. - -mark-i-m: Do all expansions start at root? - -Vadim Petrochenkov: Also, SyntaxContext:empty() is its own father. - -mark-i-m: Is this actually stored somewhere or is it a logical value? - -Vadim Petrochenkov: All expansion hyerarchies (there are several of them) start -at ExpnId::root. - -Vadim Petrochenkov: Vectors in HygieneData has entries for both ctxt == 0 and -expn_id == 0. - -Vadim Petrochenkov: I don't think anyone looks into them much though. - -mark-i-m: Ok - -Vadim Petrochenkov: Speaking of multiple hierarchies... +[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html -mark-i-m: Go ahead :) +Because macros invocations and definitions can be nested, the syntax context of +a node must be a hierarchy. For example, if we expand a macro and there is +another macro invocation or definition in the generated output, then the syntax +context should reflex the nesting. -Vadim Petrochenkov: One is parent (expn_id1) -> parent(expn_id2) -> ... +However, it turns out that there are actually a few types of context we may +want to track for different purposes. Thus, there are not just one but _three_ +expansion hierarchies that together comprise the hygiene information for a +crate. -Vadim Petrochenkov: This is the order in which macros are expanded. +All of these hierarchies need some sort of "macro ID" to identify individual +elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive +an integer ID, assigned continuously starting from 0 as we discover new macro +calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own +parent. -Vadim Petrochenkov: Well. +[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms +(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks]) +and structures related to hygiene and expansion that are kept in global data. -Vadim Petrochenkov: When we are expanding one macro another macro is revealed -in its output. +The actual hierarchies are stored in [`HygieneData`][hd]. This is a global +piece of data containing hygiene and expansion info that can be accessed from +any [`Ident`] without any context. -Vadim Petrochenkov: That's the parent-child relation in this hierarchy. -Vadim Petrochenkov: InternalExpnData::parent is the child->parent link. +[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html +[rootid]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html#method.root +[hd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.HygieneData.html +[hy]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html +[hacks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.resolve_crate_root +[`Ident`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Ident.html -mark-i-m: So in the above chain expn_id1 is the child? +### The Expansion Order Hierarchy -Vadim Petrochenkov: Yes. +The first hierarchy tracks the order of expansions, i.e., when a macro +invocation is in the output of another macro. -Vadim Petrochenkov: The second one is parent (SyntaxContext1) -> -parent(SyntaxContext2) -> ... +Here, the children in the hierarchy will be the "innermost" tokens. The +[`ExpnData`] struct itself contains a subset of properties from both macro +definition and macro call available through global data. +[`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy. -Vadim Petrochenkov: This is about nested macro definitions. When we are -expanding one macro another macro definition is revealed in its output. +[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html +[edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent -Vadim Petrochenkov: SyntaxContextData::parent is the child->parent link here. - -Vadim Petrochenkov: So, SyntaxContext is the whole chain in this hierarchy, and -outer_expns are individual elements in the chain. - -mark-i-m: So for example, suppose I have the following: +For example, +```rust,ignore macro_rules! foo { () => { println!(); } } fn main() { foo!(); } +``` -Then AST nodes that are finally generated would have parent(expn_id_println) -> -parent(expn_id_foo), right? - -Vadim Petrochenkov: Pretty common construction (at least it was, before -refactorings) is SyntaxContext::empty().apply_mark(expn_id), which means... - - Vadim Petrochenkov: - - Then AST nodes that are finally generated would have - parent(expn_id_println) -> parent(expn_id_foo), right? - -Yes. - - mark-i-m: - - and outer_expns are individual elements in the chain. - -Sorry, what is outer_expns? - -Vadim Petrochenkov: SyntaxContextData::outer_expn - -mark-i-m: Thanks :) Please continue - -Vadim Petrochenkov: ...which means a token produced by a built-in macro (which -is defined in the root effectively). - -mark-i-m: Where does the expn_id come from? - -Vadim Petrochenkov: Or a stable proc macro, which are always considered to be -defined in the root because they are always cross-crate, and we don't have the -cross-crate hygiene implemented, ha-ha. - - Vadim Petrochenkov: - - Where does the expn_id come from? - -Vadim Petrochenkov: ID of the built-in macro call like line!(). - -Vadim Petrochenkov: Assigned continuously from 0 to N as soon as we discover -new macro calls. - -mark-i-m: Sorry, I didn't quite understand. Do you mean that only built-in -macros receive continuous IDs? - -Vadim Petrochenkov: So, the second hierarchy has a catch - the context -transplantation hack - -https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. - - Vadim Petrochenkov: - - Do you mean that only built-in macros receive continuous IDs? - -Vadim Petrochenkov: No, all macro calls receive ID. - -Vadim Petrochenkov: Built-ins have the typical pattern -SyntaxContext::empty().apply_mark(expn_id) for syntax contexts produced by -them. - -mark-i-m: I see, but this pattern is only used for built-ins, right? - -Vadim Petrochenkov: And also all stable proc macros, see the comments above. - -mark-i-m: Got it - -Vadim Petrochenkov: The third hierarchy is call-site hierarchy. - -Vadim Petrochenkov: If foo!(bar!(ident)) expands into ident - -Vadim Petrochenkov: then hierarchy 1 is root -> foo -> bar -> ident - -Vadim Petrochenkov: but hierarchy 3 is root -> ident - -Vadim Petrochenkov: ExpnInfo::call_site is the child-parent link in this case. - -mark-i-m: When we expand, do we expand foo first or bar? Why is there a -hierarchy 1 here? Is that foo expands first and it expands to something that -contains bar!(ident)? - -Vadim Petrochenkov: Ah, yes, let's assume both foo and bar are identity macros. - -Vadim Petrochenkov: Then foo!(bar!(ident)) -> expand -> bar!(ident) -> expand --> ident - -Vadim Petrochenkov: If bar were expanded first, that would be eager expansion - -https://github.com/rust-lang/rfcs/pull/2320. - -mark-i-m: And after we expand only foo! presumably whatever intermediate state -has heirarchy 1 of root->foo->(bar_ident), right? - -Vadim Petrochenkov: (We have it hacked into some built-in macros, but not -generally.) - - Vadim Petrochenkov: - - And after we expand only foo! presumably whatever intermediate state has - heirarchy 1 of root->foo->(bar_ident), right? - -Vadim Petrochenkov: Yes. - -mark-i-m: Got it :) - -mark-i-m: It looks like we have ~5 minutes left. This has been very helpful -already, but I also have more questions. Shall we try to schedule another -meeting in the future? - -Vadim Petrochenkov: Sure, why not. - -Vadim Petrochenkov: A thread for offline questions-answers would be good too. - - mark-i-m: - - A thread for offline questions-answers would be good too. - -I don't mind using this thread, since it already has a lot of info in it. We -also plan to summarize the info from this thread into the rustc-dev-guide. - - Sure, why not. - -Unfortunately, I'm unavailable for a few weeks. Would August 21-ish work for -you (and @WG-learning )? - -mark-i-m: @Vadim Petrochenkov Thanks very much for your time and knowledge! - -mark-i-m: One last question: are there more hierarchies? - -Vadim Petrochenkov: Not that I know of. Three + the context transplantation -hack is already more complex than I'd like. - -mark-i-m: Yes, one wonders what it would be like if one also had to think about -eager expansion... - -Santiago Pastorino: sorry but I couldn't follow that much today, will read it -when I have some time later - -Santiago Pastorino: btw https://github.com/rust-lang/rustc-dev-guide/issues/398 - -mark-i-m: @Vadim Petrochenkov Would 7pm UTC on August 21 work for a followup? - -Vadim Petrochenkov: Tentatively yes. - -mark-i-m: @Vadim Petrochenkov @WG-learning Does this still work for everyone? - -Vadim Petrochenkov: August 21 is still ok. - -mark-i-m: @WG-learning @Vadim Petrochenkov We will start in ~30min - -Vadim Petrochenkov: Oh. Thanks for the reminder, I forgot about this entirely. - -mark-i-m: Hello! - -Vadim Petrochenkov: (I'll be here in a couple of minutes.) - -Vadim Petrochenkov: Ok, I'm here. - -mark-i-m: Hi :) - -Vadim Petrochenkov: Hi. - -mark-i-m: so last time, we talked about the 3 context heirarchies - -Vadim Petrochenkov: Right. - -mark-i-m: Was there anything you wanted to add to that? If not, I think it -would be good to get a big-picture... Given some piece of rust code, how do we -get to the point where things are expanded and hygiene context is computed? - -mark-i-m: (I'm assuming that hygiene info is computed as we expand stuff, since -I don't think you can discover it beforehand) - -Vadim Petrochenkov: Ok, let's move from hygiene to expansion. - -Vadim Petrochenkov: Especially given that I don't remember the specific hygiene -algorithms like adjust in detail. - - Vadim Petrochenkov: - - Given some piece of rust code, how do we get to the point where things are - expanded - -So, first of all, the "some piece of rust code" is the whole crate. - -mark-i-m: Just to confirm, the algorithms are well-encapsulated, right? Like a -function or a struct as opposed to a bunch of conventions distributed across -the codebase? - -Vadim Petrochenkov: We run fully_expand_fragment in it. - - Vadim Petrochenkov: - - Just to confirm, the algorithms are well-encapsulated, right? - -Yes, the algorithmic parts are entirely inside hygiene.rs. - -Vadim Petrochenkov: Ok, some are in fn resolve_crate_root, but those are hacks. - -Vadim Petrochenkov: (Continuing about expansion.) If fully_expand_fragment is -run not on a whole crate, it means that we are performing eager expansion. - -Vadim Petrochenkov: Eager expansion is done for arguments of some built-in -macros that expect literals. - -Vadim Petrochenkov: It generally performs a subset of actions performed by the -non-eager expansion. - -Vadim Petrochenkov: So, I'll talk about non-eager expansion for now. - -mark-i-m: Eager expansion is not exposed as a language feature, right? i.e. it -is not possible for me to write an eager macro? +In this code, the AST nodes that are finally generated would have hierarchy: -Vadim Petrochenkov: -https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 (vvv The -link is explained below vvv ) +``` +root + expn_id_foo + expn_id_println +``` - Vadim Petrochenkov: +### The Macro Definition Hierarchy - Eager expansion is not exposed as a language feature, right? i.e. it is not - possible for me to write an eager macro? +The second hierarchy tracks the order of macro definitions, i.e., when we are +expanding one macro another macro definition is revealed in its output. This +one is a bit tricky and more complex than the other two hierarchies. -Yes, it's entirely an ability of some built-in macros. +[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID. +[`SyntaxContextData`][scd] contains data associated with the given +`SyntaxContext`; mostly it is a cache for results of filtering that chain in +different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent +link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual +elements in the chain. The "chaining operator" is +[`SyntaxContext::apply_mark`][am] in compiler code. -Vadim Petrochenkov: Not exposed for general use. +A [`Span`][span], mentioned above, is actually just a compact representation of +a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned +[`Symbol`] + `Span` (i.e. an interned string + hygiene data). -Vadim Petrochenkov: fully_expand_fragment works in iterations. +[`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html +[scd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html +[scdp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.parent +[sc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html +[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn +[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark -Vadim Petrochenkov: Iterations looks roughly like this: -- Resolve imports in our partially built crate as much as possible. -- Collect as many macro invocations as possible from our partially built crate - (fn-like, attributes, derives) from the crate and add them to the queue. +For built-in macros, we use the context: +`SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to +be defined at the hierarchy root. We do the same for proc-macros because we +haven't implemented cross-crate hygiene yet. - Vadim Petrochenkov: Take a macro from the queue, and attempt to resolve it. +If the token had context `X` before being produced by a macro then after being +produced by the macro it has context `X -> macro_id`. Here are some examples: - Vadim Petrochenkov: If it's resolved - run its expander function that - consumes tokens or AST and produces tokens or AST (depending on the macro - kind). +Example 0: - Vadim Petrochenkov: (If it's not resolved, then put it back into the - queue.) +```rust,ignore +macro m() { ident } -Vadim Petrochenkov: ^^^ That's where we fill in the hygiene data associated -with ExpnIds. +m!(); +``` -mark-i-m: When we put it back in the queue? +Here `ident` originally has context [`SyntaxContext::root()`][scr]. `ident` has +context `ROOT -> id(m)` after it's produced by `m`. -mark-i-m: or do you mean the collect step in general? +[scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root -Vadim Petrochenkov: Once we resolved the macro call to the macro definition we -know everything about the macro and can call set_expn_data to fill in its -properties in the global data. -Vadim Petrochenkov: I mean, immediately after successful resolution. +Example 1: -Vadim Petrochenkov: That's the first part of hygiene data, the second one is -associated with SyntaxContext rather than with ExpnId, it's filled in later -during expansion. +```rust,ignore +macro m() { macro n() { ident } } -Vadim Petrochenkov: So, after we run the macro's expander function and got a -piece of AST (or got tokens and parsed them into a piece of AST) we need to -integrate that piece of AST into the big existing partially built AST. +m!(); +n!(); +``` +In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)` +after the first expansion, then `ROOT -> id(m) -> id(n)`. -Vadim Petrochenkov: This integration is a really important step where the next -things happen: -- NodeIds are assigned. +Example 2: - Vadim Petrochenkov: "def paths"s and their IDs (DefIds) are created +Note that these chains are not entirely determined by their last element, in +other words `ExpnId` is not isomorphic to `SyntaxContext`. - Vadim Petrochenkov: Names are put into modules from the resolver point of - view. +```rust,ignore +macro m($i: ident) { macro n() { ($i, bar) } } -Vadim Petrochenkov: So, we are basically turning some vague token-like mass -into proper set in stone hierarhical AST and side tables. +m!(foo); +``` -Vadim Petrochenkov: Where exactly this happens - NodeIds are assigned by -InvocationCollector (which also collects new macro calls from this new AST -piece and adds them to the queue), DefIds are created by DefCollector, and -modules are filled by BuildReducedGraphVisitor. +After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context +`ROOT -> id(m) -> id(n)`. -Vadim Petrochenkov: These three passes run one after another on every AST -fragment freshly expanded from a macro. +Finally, one last thing to mention is that currently, this hierarchy is subject +to the ["context transplantation hack"][hack]. Basically, the more modern (and +experimental) `macro` macros have stronger hygiene than the older MBE system, +but this can result in weird interactions between the two. The hack is intended +to make things "just work" for now. -Vadim Petrochenkov: After expanding a single macro and integrating its output -we again try to resolve all imports in the crate, and then return to the big -queue processing loop and pick up the next macro. +[hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732 -Vadim Petrochenkov: Repeat until there's no more macros. Vadim Petrochenkov: +### The Call-site Hierarchy -mark-i-m: The integration step is where we would get parser errors too right? +The third and final hierarchy tracks the location of macro invocations. -mark-i-m: Also, when do we know definitively that resolution has failed for -particular ident? +In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link. - Vadim Petrochenkov: +[callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site - The integration step is where we would get parser errors too right? +Here is an example: -Yes, if the macro produced tokens (rather than AST directly) and we had to -parse them. +```rust,ignore +macro bar($i: ident) { $i } +macro foo($i: ident) { $i } - Vadim Petrochenkov: +foo!(bar!(baz)); +``` - when do we know definitively that resolution has failed for particular - ident? +For the `baz` AST node in the final output, the first hierarchy is `ROOT -> +id(foo) -> id(bar) -> baz`, while the third hierarchy is `ROOT -> baz`. -So, ident is looked up in a number of scopes during resolution. From closest -like the current block or module, to far away like preludes or built-in types. +### Macro Backtraces -Vadim Petrochenkov: If lookup is certainly failed in all of the scopes, then -it's certainly failed. +Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery +in [`rustc_span::hygiene`][hy]. -mark-i-m: This is after all expansions and integrations are done, right? +[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html -Vadim Petrochenkov: "Certainly" is determined differently for different scopes, -e.g. for a module scope it means no unexpanded macros and no unresolved glob -imports in that module. +## Producing Macro Output - Vadim Petrochenkov: +Above, we saw how the output of a macro is integrated into the AST for a crate, +and we also saw how the hygiene data for a crate is generated. But how do we +actually produce the output of a macro? It depends on the type of macro. - This is after all expansions and integrations are done, right? +There are two types of macros in Rust: +`macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros +(or "proc macros"; including custom derives). During the parsing phase, the normal +Rust parser will set aside the contents of macros and their invocations. Later, +macros are expanded using these portions of the code. + +Some important data structures/interfaces here: +- [`SyntaxExtension`] - a lowered macro representation, contains its expander + function, which transforms a `TokenStream` or AST into another `TokenStream` + or AST + some additional data like stability, or a list of unstable features + allowed inside the macro. +- [`SyntaxExtensionKind`] - expander functions may have several different + signatures (take one token stream, or two, or a piece of AST, etc). This is + an enum that lists them. +- [`ProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] - + traits representing the expander function signatures. + +[`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html +[`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html +[`ProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ProcMacro.html +[`TTMacroExpander`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.TTMacroExpander.html +[`AttrProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.AttrProcMacro.html +[`MultiItemModifier`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MultiItemModifier.html + +## Macros By Example + +MBEs have their own parser distinct from the normal Rust parser. When macros +are expanded, we may invoke the MBE parser to parse and expand a macro. The +MBE parser, in turn, may call the normal Rust parser when it needs to bind a +metavariable (e.g. `$my_expr`) while parsing the contents of a macro +invocation. The code for macro expansion is in +[`src/librustc_expand/mbe/`][code_dir]. -For macro and import names this happens during expansions and integrations. +### Example -mark-i-m: Makes sense +It's helpful to have an example to refer to. For the remainder of this chapter, +whenever we refer to the "example _definition_", we mean the following: -Vadim Petrochenkov: For all other names we certainly know whether a name is -resolved successfully or not on the first attempt, because no new names can -appear. +```rust,ignore +macro_rules! printer { + (print $mvar:ident) => { + println!("{}", $mvar); + }; + (print twice $mvar:ident) => { + println!("{}", $mvar); + println!("{}", $mvar); + }; +} +``` -Vadim Petrochenkov: (They are resolved in a later pass, see -librustc_resolve/late.rs.) +`$mvar` is called a _metavariable_. Unlike normal variables, rather than +binding to a value in a computation, a metavariable binds _at compile time_ to +a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an +identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other +special tokens, such as `EOF`, which indicates that there are no more tokens. +Token trees resulting from paired parentheses-like characters (`(`...`)`, +`[`...`]`, and `{`...`}`) – they include the open and close and all the tokens +in between (we do require that parentheses-like characters be balanced). Having +macro expansion operate on token streams rather than the raw bytes of a source +file abstracts away a lot of complexity. The macro expander (and much of the +rest of the compiler) doesn't really care that much about the exact line and +column of some syntactic construct in the code; it cares about what constructs +are used in the code. Using tokens allows us to care about _what_ without +worrying about _where_. For more information about tokens, see the +[Parsing][parsing] chapter of this book. -mark-i-m: And if at the end of the iteration, there are still things in the -queue that can't be resolve, this represents an error, right? +Whenever we refer to the "example _invocation_", we mean the following snippet: -mark-i-m: i.e. an undefined macro? +```rust,ignore +printer!(print foo); // Assume `foo` is a variable defined somewhere else... +``` -Vadim Petrochenkov: Yes, if we make no progress during an iteration, then we -are stuck and that state represent an error. +The process of expanding the macro invocation into the syntax tree +`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is +called _macro expansion_, and it is the topic of this chapter. -Vadim Petrochenkov: We attempt to recover though, using dummies expanding into -nothing or ExprKind::Err or something like that for unresolved macros. +### The MBE parser -mark-i-m: This is for the purposes of diagnostics, though, right? +There are two parts to MBE expansion: parsing the definition and parsing the +invocations. Interestingly, both are done by the macro parser. -Vadim Petrochenkov: But if we are going through recovery, then compilation must -result in an error anyway. +Basically, the MBE parser is like an NFA-based regex parser. It uses an +algorithm similar in spirit to the [Earley parsing +algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is +defined in [`src/librustc_expand/mbe/macro_parser.rs`][code_mp]. -Vadim Petrochenkov: Yes, that's for diagnostics, without recovery we would -stuck at the first unresolved macro or import. Vadim Petrochenkov: +The interface of the macro parser is as follows (this is slightly simplified): -So, about the SyntaxContext hygiene... +```rust,ignore +fn parse_tt( + parser: &mut Cow, + ms: &[TokenTree], +) -> NamedParseResult +``` -Vadim Petrochenkov: New syntax contexts are created during macro expansion. +We use these items in macro parser: -Vadim Petrochenkov: If the token had context X before being produced by a -macro, e.g. here ident has context SyntaxContext::root(): Vadim Petrochenkov: +- `parser` is a reference to the state of a normal Rust parser, including the + token stream and parsing session. The token stream is what we are about to + ask the MBE parser to parse. We will consume the raw stream of tokens and + output a binding of metavariables to corresponding token trees. The parsing + session can be used to report parser errros. +- `ms` a _matcher_. This is a sequence of token trees that we want to match + the token stream against. -macro m() { ident } +In the analogy of a regex parser, the token stream is the input and we are matching it +against the pattern `ms`. Using our examples, the token stream could be the stream of +tokens containing the inside of the example invocation `print foo`, while `ms` +might be the sequence of token (trees) `print $mvar:ident`. -Vadim Petrochenkov: , then after being produced by the macro it has context X --> macro_id. +The output of the parser is a `NamedParseResult`, which indicates which of +three cases has occurred: -Vadim Petrochenkov: I.e. our ident has context ROOT -> id(m) after it's -produced by m. +- Success: the token stream matches the given matcher `ms`, and we have produced a binding + from metavariables to the corresponding token trees. +- Failure: the token stream does not match `ms`. This results in an error message such as + "No rule expected token _blah_". +- Error: some fatal error has occurred _in the parser_. For example, this + happens if there are more than one pattern match, since that indicates + the macro is ambiguous. -Vadim Petrochenkov: The "chaining operator" -> is apply_mark in compiler code. -Vadim Petrochenkov: +The full interface is defined [here][code_parse_int]. -macro m() { macro n() { ident } } +The macro parser does pretty much exactly the same as a normal regex parser with +one exception: in order to parse different types of metavariables, such as +`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the +normal Rust parser. -Vadim Petrochenkov: In this example the ident has context ROOT originally, then -ROOT -> id(m), then ROOT -> id(m) -> id(n). +As mentioned above, both definitions and invocations of macros are parsed using +the macro parser. This is extremely non-intuitive and self-referential. The code +to parse macro _definitions_ is in +[`src/librustc_expand/mbe/macro_rules.rs`][code_mr]. It defines the pattern for +matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words, +a `macro_rules` definition should have in its body at least one occurrence of a +token tree followed by `=>` followed by another token tree. When the compiler +comes to a `macro_rules` definition, it uses this pattern to match the two token +trees per rule in the definition of the macro _using the macro parser itself_. +In our example definition, the metavariable `$lhs` would match the patterns of +both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` +would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{ +println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this +knowledge around for when it needs to expand a macro invocation. -Vadim Petrochenkov: Note that these chains are not entirely determined by their -last element, in other words ExpnId is not isomorphic to SyntaxCtxt. +When the compiler comes to a macro invocation, it parses that invocation using +the same NFA-based macro parser that is described above. However, the matcher +used is the first token tree (`$lhs`) extracted from the arms of the macro +_definition_. Using our example, we would try to match the token stream `print +foo` from the invocation against the matchers `print $mvar:ident` and `print +twice $mvar:ident` that we previously extracted from the definition. The +algorithm is exactly the same, but when the macro parser comes to a place in the +current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), +it calls back to the normal Rust parser to get the contents of that +non-terminal. In this case, the Rust parser would look for an `ident` token, +which it finds (`foo`) and returns to the macro parser. Then, the macro parser +proceeds in parsing as normal. Also, note that exactly one of the matchers from +the various arms should match the invocation; if there is more than one match, +the parse is ambiguous, while if there are no matches at all, there is a syntax +error. -Vadim Petrochenkov: Couterexample: Vadim Petrochenkov: +For more information about the macro parser's implementation, see the comments +in [`src/librustc_expand/mbe/macro_parser.rs`][code_mp]. -macro m($i: ident) { macro n() { ($i, bar) } } +### `macro`s and Macros 2.0 -m!(foo); +There is an old and mostly undocumented effort to improve the MBE system, give +it more hygiene-related features, better scoping and visibility rules, etc. There +hasn't been a lot of work on this recently, unfortunately. Internally, `macro` +macros use the same machinery as today's MBEs; they just have additional +syntactic sugar and are allowed to be in namespaces. -Vadim Petrochenkov: foo has context ROOT -> id(n) and bar has context ROOT -> -id(m) -> id(n) after all the expansions. +## Procedural Macros -mark-i-m: Cool :) +Precedural macros are also expanded during parsing, as mentioned above. +However, they use a rather different mechanism. Rather than having a parser in +the compiler, procedural macros are implemented as custom, third-party crates. +The compiler will compile the proc macro crate and specially annotated +functions in them (i.e. the proc macro itself), passing them a stream of tokens. -mark-i-m: It looks like we are out of time +The proc macro can then transform the token stream and output a new token +stream, which is synthesized into the AST. -mark-i-m: Is there anything you wanted to add? +It's worth noting that the token stream type used by proc macros is _stable_, +so `rustc` does not use it internally (since our internal data structures are +unstable). The compiler's token stream is +[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is +converted into the stable [`proc_macro::TokenStream`][stablets] and back in +[`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms]. +Because the Rust ABI is unstable, we use the C ABI for this conversion. -mark-i-m: We can schedule another meeting if you would like +[tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html +[rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html +[stablets]: https://doc.rust-lang.org/proc_macro/struct.TokenStream.html +[pm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro/index.html +[pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html -Vadim Petrochenkov: Yep, 23.06 already. No, I think this is an ok point to -stop. +TODO: more here. -mark-i-m: :+1: +### Custom Derive -mark-i-m: Thanks @Vadim Petrochenkov ! This was very helpful +Custom derives are a special type of proc macro. -Vadim Petrochenkov: Yeah, we can schedule another one. So far it's been like 1 -hour of meetings per month? Certainly not a big burden. -``` +TODO: more? diff --git a/src/name-resolution.md b/src/name-resolution.md index f3aacba00..d08fe43f3 100644 --- a/src/name-resolution.md +++ b/src/name-resolution.md @@ -1,5 +1,28 @@ # Name resolution +In the previous chapters, we saw how the AST is built with all macros expanded. +We saw how doing that requires doing some name resolution to resolve imports +and macro names. In this chapter, we show how this is actually done and more. + +In fact, we don't do full name resolution during macro expansion -- we only +resolve imports and macros at that time. This is required to know what to even +expand. Later, after we have the whole AST, we due full name resolution to +resolve all names in the crate. This happens in [`rustc_resolve::late`][late]. +Unlike during macro expansion, in this late expansion, we only need to try to +resolve a name once, since no new names can be added. If we fail to resolve a +name now, then it is a compiler error. + +Name resolution can be complex. There are a few different namespaces (e.g. +macros, values, types, lifetimes), and names my be valid at different (nested) +scopes. Also, different types of names can fail to be resolved differently, and +failures can happen differently at different scopes. For example, for a module +scope, failure means no unexpanded macros and no unresolved glob imports in +that module. On the other hand, in a function body, failure requires that a +name be absent from the block we are in, all outer scopes, and the global +scope. + +[late]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/late/index.html + ## Basics In our programs we can refer to variables, types, functions, etc, by giving them diff --git a/src/part-3-intro.md b/src/part-3-intro.md index a1ec3ca90..2af8cce23 100644 --- a/src/part-3-intro.md +++ b/src/part-3-intro.md @@ -2,7 +2,9 @@ This part describes the process of taking raw source code from the user and transforming it into various forms that the compiler can work with easily. -These are called intermediate representations. +These are called _intermediate representations (IRs)_. This process starts with compiler understanding what the user has asked for: parsing the command line arguments given and determining what it is to compile. +After that, the compiler transforms the user input into a series of IRs that +look progressively less like what the user wrote. diff --git a/src/syntax-intro.md b/src/syntax-intro.md index dd7e2d735..43ef44577 100644 --- a/src/syntax-intro.md +++ b/src/syntax-intro.md @@ -6,3 +6,8 @@ out that doing even this involves a lot of work, including lexing, parsing, macro expansion, name resolution, conditional compilation, feature-gate checking, and validation of the AST. In this chapter, we take a look at all of these steps. + +Notably, there isn't always a clean ordering between these tasks. For example, +macro expansion relies on name resolution to resolve the names of macros and +imports. And parsing requires macro expansion, which in turn may require +parsing the output of the macro. diff --git a/src/the-parser.md b/src/the-parser.md index c84ac4ea2..da318c9ef 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -7,10 +7,11 @@ The very first thing the compiler does is take the program (in Unicode characters) and turn it into something the compiler can work with more conveniently than strings. This happens in two stages: Lexing and Parsing. -Lexing takes strings and turns them into streams of tokens. For example, +Lexing takes strings and turns them into streams of [tokens]. For example, `a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`. The lexer lives in [`librustc_lexer`][lexer]. +[tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html Parsing then takes streams of tokens and turns them into a structured @@ -38,6 +39,11 @@ To minimise the amount of copying that is done, both the `StringReader` and `Parser` have lifetimes which bind them to the parent `ParseSess`. This contains all the information needed while parsing, as well as the `SourceMap` itself. +Note that while parsing, we may encounter macro definitions or invocations. We +set these aside to be expanded (see [this chapter](./macro-expansion.md)). +Expansion may itself require parsing the output of the macro, which may reveal +more macros to be expanded, and so on. + ## More on Lexical Analysis Code for lexical analysis is split between two crates: