From 810777174a256c194f0c5bcb3ffb0a6351dbff8f Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 25 Apr 2020 13:09:19 -0500 Subject: [PATCH 01/22] some cleanup in the macros chapter --- src/macro-expansion.md | 222 +++++++++++------------------------------ 1 file changed, 58 insertions(+), 164 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 279598270..e7a09d31c 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -3,15 +3,22 @@ > `librustc_ast`, `librustc_expand`, and `librustc_builtin_macros` are all undergoing > refactoring, so some of the links in this chapter may be broken. -Macro expansion happens during parsing. `rustc` has two parsers, in fact: the -normal Rust parser, and the macro parser. During the parsing phase, the normal +Rust has a very powerful macro system. There are two major types of macros: +`macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros +("proc macros"; including custom derives). During the parsing phase, the normal Rust parser will set aside the contents of macros and their invocations. Later, before name resolution, macros are expanded using these portions of the code. -The macro parser, in turn, may call the normal Rust parser when it needs to -bind a metavariable (e.g. `$my_expr`) while parsing the contents of a macro +In this chapter, we will discuss MBEs, proc macros, and hygiene. Both types of +macros are expanded during parsing, but they happen in different ways. + +## Macros By Example + +MBEs have their own parser distinct from the normal Rust parser. When macros +are expanded, we may invoke the MBE parser to parse and expand a macro. The +MBE parser, in turn, may call the normal Rust parser when it needs to bind a +metavariable (e.g. `$my_expr`) while parsing the contents of a macro invocation. The code for macro expansion is in -[`src/librustc_expand/mbe/`][code_dir]. This chapter aims to explain how macro -expansion works. +[`src/librustc_expand/mbe/`][code_dir]. ### Example @@ -56,7 +63,7 @@ The process of expanding the macro invocation into the syntax tree `println!("{}", foo)` and then expanding that into a call to `Display::fmt` is called _macro expansion_, and it is the topic of this chapter. -### The macro parser +### The MBE parser There are two parts to macro expansion: parsing the definition and parsing the invocations. Interestingly, both are done by the macro parser. @@ -70,33 +77,32 @@ The interface of the macro parser is as follows (this is slightly simplified): ```rust,ignore fn parse_tt( - parser: &mut Cow, + parser: &mut Cow, ms: &[TokenTree], ) -> NamedParseResult ``` We use these items in macro parser: -- `sess` is a "parsing session", which keeps track of some metadata. Most - notably, this is used to keep track of errors that are generated so they can - be reported to the user. -- `tts` is a stream of tokens. The macro parser's job is to consume the raw - stream of tokens and output a binding of metavariables to corresponding token - trees. +- `parser` is a reference to the state of a normal Rust parser, including the + token stream and parsing session. The token stream is what we are about to + ask the MBE parser to parse. We will consume the raw stream of tokens and + output a binding of metavariables to corresponding token trees. The parsing + session can be used to report parser errros. - `ms` a _matcher_. This is a sequence of token trees that we want to match - `tts` against. + the token stream against. -In the analogy of a regex parser, `tts` is the input and we are matching it -against the pattern `ms`. Using our examples, `tts` could be the stream of +In the analogy of a regex parser, the token stream is the input and we are matching it +against the pattern `ms`. Using our examples, the token stream could be the stream of tokens containing the inside of the example invocation `print foo`, while `ms` might be the sequence of token (trees) `print $mvar:ident`. The output of the parser is a `NamedParseResult`, which indicates which of three cases has occurred: -- Success: `tts` matches the given matcher `ms`, and we have produced a binding +- Success: the token stream matches the given matcher `ms`, and we have produced a binding from metavariables to the corresponding token trees. -- Failure: `tts` does not match `ms`. This results in an error message such as +- Failure: the token stream does not match `ms`. This results in an error message such as "No rule expected token _blah_". - Error: some fatal error has occurred _in the parser_. For example, this happens if there are more than one pattern match, since that indicates @@ -143,7 +149,38 @@ error. For more information about the macro parser's implementation, see the comments in [`src/librustc_expand/mbe/macro_parser.rs`][code_mp]. -### Hygiene +### `macro`s and Macros 2.0 + +There is an old and mostly undocumented effort to improve the MBE system, give +it more hygiene-related features, better scoping and visibility rules, etc. There +hasn't been a lot of work on this recently, unfortunately. Internally, `macro` +macros use the same machinery as today's MBEs; they just have additional +syntactic sugar and are allowed to be in namespaces. + +## Procedural Macros + +Precedural macros are also expanded during parsing, as mentioned above. +However, they use a rather different mechanism. Rather than having a parser in +the compiler, procedural macros are implemented as custom, third-party crates. +The compiler will compile the proc macro crate and specially annotated +functions in them (i.e. the proc macro itself), passing them a stream of tokens. + +The proc macro can then transform the token stream and output a new token +stream, which is synthesized into the AST. + +It's worth noting that the token stream type used by proc macros is _stable_, +so `rustc` does not use it internally (since our internal data structures are +unstable). + +TODO: more here. + +### Custom Derive + +Custom derives are a special type of proc macro. + +TODO: more? + +## Hygiene If you have ever used C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code: @@ -190,21 +227,7 @@ a macro author may want to introduce a new name to the context where the macro was called. Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro). -In rustc, this "context" is tracked via `Span`s. - -TODO: what is call-site hygiene? what is def-site hygiene? - -TODO - -### Procedural Macros - -TODO - -### Custom Derive - -TODO - -TODO: maybe something about macros 2.0? +This section is about how that context is tracked. [code_dir]: https://github.com/rust-lang/rust/tree/master/src/librustc_expand/mbe @@ -221,135 +244,6 @@ The rest of this chapter is a dump of a discussion between `mark-i-m` and it never gets lost until we can make it into a proper chapter. ```txt -mark-i-m: @Vadim Petrochenkov Hi :wave: -I was wondering if you would have a chance sometime in the next month or so to -just have a zulip discussion where you tell us (WG-learning) everything you -know about macros/expansion/hygiene. We were thinking this could be less formal -(and less work for you) than compiler lecture series lecture... thoughts? - -mark-i-m: The goal is to fill out that long-standing gap in the rustc-dev-guide - -Vadim Petrochenkov: Ok, I'm at UTC+03:00 and generally available in the -evenings (or weekends). - -mark-i-m: @Vadim Petrochenkov Either of those works for me (your evenings are -about lunch time for me :) ) Is there a particular date that would work best -for you? - -mark-i-m: @WG-learning Does anyone else have a preferred date? - - Vadim Petrochenkov: - - Is there a particular date that would work best for you? - -Nah, not much difference. (If something changes for a specific day, I'll -notify.) - -Santiago Pastorino: week days are better, but I'd say let's wait for @Vadim -Petrochenkov to say when they are ready for it and we can set a date - -Santiago Pastorino: also, we should record this so ... I guess it doesn't -matter that much when :) - - mark-i-m: - - also, we should record this so ... I guess it doesn't matter that much when - :) - -@Santiago Pastorino My thinking was to just use zulip, so we would have the log - -mark-i-m: @Vadim Petrochenkov @WG-learning How about 2 weeks from now: July 24 -at 5pm UTC time (if I did the math right, that should be evening for Vadim) - -Amanjeev Sethi: i can try and do this but I am starting a new job that week so -cannot promise. - - Santiago Pastorino: - - Vadim Petrochenkov @WG-learning How about 2 weeks from now: July 24 at 5pm - UTC time (if I did the math right, that should be evening for Vadim) - -works perfect for me - -Santiago Pastorino: @mark-i-m I have access to the compiler calendar so I can -add something there - -Santiago Pastorino: let me know if you want to add an event to the calendar, I -can do that - -Santiago Pastorino: how long it would be? - - mark-i-m: - - let me know if you want to add an event to the calendar, I can do that - -mark-i-m: That could be good :+1: - - mark-i-m: - - how long it would be? - -Let's start with 30 minutes, and if we need to schedule another we cna - - Vadim Petrochenkov: - - 5pm UTC - -1-2 hours later would be better, 5pm UTC is not evening enough. - -Vadim Petrochenkov: How exactly do you plan the meeting to go (aka how much do -I need to prepare)? - - Santiago Pastorino: - - 5pm UTC - - 1-2 hours later would be better, 5pm UTC is not evening enough. - -Scheduled for 7pm UTC then - - Santiago Pastorino: - - How exactly do you plan the meeting to go (aka how much do I need to - prepare)? - -/cc @mark-i-m - -mark-i-m: @Vadim Petrochenkov - - How exactly do you plan the meeting to go (aka how much do I need to - prepare)? - -My hope was that this could be less formal than for a compiler lecture series, -but it would be nice if you could have in your mind a tour of the design and -the code - -That is, imagine that a new person was joining the compiler team and needed to -get up to speed about macros/expansion/hygiene. What would you tell such a -person? - -mark-i-m: @Vadim Petrochenkov Are we still on for tomorrow at 7pm UTC? - -Vadim Petrochenkov: Yes. - -Santiago Pastorino: @Vadim Petrochenkov @mark-i-m I've added an event on rust -compiler team calendar - -mark-i-m: @WG-learning @Vadim Petrochenkov Hello! - -mark-i-m: We will be starting in ~7 minutes - -mark-i-m: :wave: - -Vadim Petrochenkov: I'm here. - -mark-i-m: Cool :) - -Santiago Pastorino: hello @Vadim Petrochenkov - -mark-i-m: Shall we start? - -mark-i-m: First off, @Vadim Petrochenkov Thanks for doing this! Vadim Petrochenkov: Here's some preliminary data I prepared. From 342ca3cff64d62ef9ec86175285544cec0ac18df Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 25 Apr 2020 13:42:01 -0500 Subject: [PATCH 02/22] start working through discussion --- src/macro-expansion.md | 249 +++++++++-------------------------------- 1 file changed, 51 insertions(+), 198 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index e7a09d31c..002e077d8 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -229,136 +229,69 @@ only within the macro (i.e. it should not be visible outside the macro). This section is about how that context is tracked. - [code_dir]: https://github.com/rust-lang/rust/tree/master/src/librustc_expand/mbe [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser [code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/fn.parse_tt.html [parsing]: ./the-parser.html +## Notes from petrochenkov discussion + +Where to find the code: +- librustc_span/hygiene.rs - structures related to hygiene and expansion that are kept in global data (can be accessed from any Ident without any context) +- librustc_span/lib.rs - some secondary methods like macro backtrace using primary methods from hygiene.rs +- librustc_builtin_macros - implementations of built-in macros (including macro attributes and derives) and some other early code generation facilities like injection of standard library imports or generation of test harness. +- librustc_ast/config.rs - implementation of cfg/cfg_attr (they treated specially from other macros), should probably be moved into librustc_ast/ext. +- librustc_ast/tokenstream.rs + librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees, and token streams. +- librustc_ast/ext - various expansion-related stuff +- librustc_ast/ext/base.rs - basic structures used by expansion +- librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion infrastructure code - collecting macro invocations, calling into resolve for them, calling their expanding functions, and integrating the results back into AST +- librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for "integrating the results back into AST" basicallly, "placeholder" is a temporary AST node replaced with macro expansion result nodes +- librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably be moved into librustc_builtin_macros these days +- librustc_ast/ext/proc_macro.rs + librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the stable proc_macro library, converting tokens and token streams between the two representations and sending them through C ABI +- librustc_ast/ext/tt - implementation of macro_rules, turns macro_rules DSL into something with signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, @mark-i-m knows more about this +- librustc_resolve/macros.rs - resolving macro paths, validating those resolutions, reporting various "not found"/"found, but it's unstable"/"expected x, found y" errors +- librustc_middle/hir/map/def_collector.rs + librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly expanded from a macro into various parent/child structures like module hierarchy or "definition paths" + +Primary structures: +- HygieneData - global piece of data containing hygiene and expansion info that can be accessed from any Ident without any context +- ExpnId - ID of a macro call or desugaring (and also expansion of that call/desugaring, depending on context) +- ExpnInfo/InternalExpnData - a subset of properties from both macro definition and macro call available through global data +- SyntaxContext - ID of a chain of nested macro definitions (identified by ExpnIds) +- SyntaxContextData - data associated with the given SyntaxContext, mostly a cache for results of filtering that chain in different ways +- Span - a code location + SyntaxContext +- Ident - interned string (Symbol) + Span, i.e. a string with attached hygiene data +- TokenStream - a collection of TokenTrees +- TokenTree - a token (punctuation, identifier, or literal) or a delimited group (anything inside ()/[]/{}) +- SyntaxExtension - a lowered macro representation, contains its expander function transforming a tokenstream or AST into tokenstream or AST + some additional data like stability, or a list of unstable features allowed inside the macro. +- SyntaxExtensionKind - expander functions may have several different signatures (take one token stream, or two, or a piece of AST, etc), this is an enum that lists them +- ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing the expander signatures (TODO: change and rename the signatures into something more consistent) +- Resolver - a trait used to break crate dependencies (so resolver services can be used in librustc_ast, despite librustc_resolve and pretty much everything else depending on librustc_ast) +- ExtCtxt/ExpansionData - various intermediate data kept and used by expansion infra in the process of its work +- AstFragment - a piece of AST that can be produced by a macro (may include multiple homogeneous AST nodes, like e.g. a list of items) +- Annotatable - a piece of AST that can be an attribute target, almost same thing as AstFragment except for types and patterns that can be produced by macros but cannot be annotated with attributes (TODO: Merge into AstFragment) +- MacResult - a "polymorphic" AST fragment, something that can turn into a different AstFragment depending on its context (aka AstFragmentKind - item, or expression, or pattern etc.) +- Invocation/InvocationKind - a structure describing a macro call, these structures are collected by the expansion infra (InvocationCollector), queued, resolved, expanded when resolved, etc. + +TODO: how a crate transitions from the state "macros exist as written in source" to "all macros are expanded" + +Expansion Heirarchies and Syntax Context +- Many AST nodes have some sort of syntax context, especially nodes from macros. The context consists of a chain of expansions leading to `ExpnId::root`. A non-macro-expanded node has syntax context 0 (`SyntaxContext::empty()`) which represents just the root node. +- There are 3 expansion heirarchies + - They all start at ExpnId::root, which is its own parent + + + + + + # Discussion about hygiene -The rest of this chapter is a dump of a discussion between `mark-i-m` and -`petrochenkov` about Macro Expansion and Hygiene. I am pasting it here so that -it never gets lost until we can make it into a proper chapter. ```txt -Vadim Petrochenkov: Here's some preliminary data I prepared. - -Vadim Petrochenkov: Below I'll assume #62771 and #62086 has landed. - -Vadim Petrochenkov: Where to find the code: librustc_span/hygiene.rs - -structures related to hygiene and expansion that are kept in global data (can -be accessed from any Ident without any context) librustc_span/lib.rs - some -secondary methods like macro backtrace using primary methods from hygiene.rs -librustc_builtin_macros - implementations of built-in macros (including macro attributes -and derives) and some other early code generation facilities like injection of -standard library imports or generation of test harness. librustc_ast/config.rs - -implementation of cfg/cfg_attr (they treated specially from other macros), -should probably be moved into librustc_ast/ext. librustc_ast/tokenstream.rs + -librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees, -and token streams. librustc_ast/ext - various expansion-related stuff -librustc_ast/ext/base.rs - basic structures used by expansion -librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion -infrastructure code - collecting macro invocations, calling into resolve for -them, calling their expanding functions, and integrating the results back into -AST librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for -"integrating the results back into AST" basicallly, "placeholder" is a -temporary AST node replaced with macro expansion result nodes -librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros -in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably -be moved into librustc_builtin_macros these days librustc_ast/ext/proc_macro.rs + -librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the -stable proc_macro library, converting tokens and token streams between the two -representations and sending them through C ABI librustc_ast/ext/tt - -implementation of macro_rules, turns macro_rules DSL into something with -signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, -@mark-i-m knows more about this librustc_resolve/macros.rs - resolving macro -paths, validating those resolutions, reporting various "not found"/"found, but -it's unstable"/"expected x, found y" errors librustc_middle/hir/map/def_collector.rs + -librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly -expanded from a macro into various parent/child structures like module -hierarchy or "definition paths" - -Primary structures: HygieneData - global piece of data containing hygiene and -expansion info that can be accessed from any Ident without any context ExpnId - -ID of a macro call or desugaring (and also expansion of that call/desugaring, -depending on context) ExpnInfo/InternalExpnData - a subset of properties from -both macro definition and macro call available through global data -SyntaxContext - ID of a chain of nested macro definitions (identified by -ExpnIds) SyntaxContextData - data associated with the given SyntaxContext, -mostly a cache for results of filtering that chain in different ways Span - a -code location + SyntaxContext Ident - interned string (Symbol) + Span, i.e. a -string with attached hygiene data TokenStream - a collection of TokenTrees -TokenTree - a token (punctuation, identifier, or literal) or a delimited group -(anything inside ()/[]/{}) SyntaxExtension - a lowered macro representation, -contains its expander function transforming a tokenstream or AST into -tokenstream or AST + some additional data like stability, or a list of unstable -features allowed inside the macro. SyntaxExtensionKind - expander functions -may have several different signatures (take one token stream, or two, or a -piece of AST, etc), this is an enum that lists them -ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing -the expander signatures (TODO: change and rename the signatures into something -more consistent) trait Resolver - a trait used to break crate dependencies (so -resolver services can be used in librustc_ast, despite librustc_resolve and pretty -much everything else depending on librustc_ast) ExtCtxt/ExpansionData - various -intermediate data kept and used by expansion infra in the process of its work -AstFragment - a piece of AST that can be produced by a macro (may include -multiple homogeneous AST nodes, like e.g. a list of items) Annotatable - a -piece of AST that can be an attribute target, almost same thing as AstFragment -except for types and patterns that can be produced by macros but cannot be -annotated with attributes (TODO: Merge into AstFragment) trait MacResult - a -"polymorphic" AST fragment, something that can turn into a different -AstFragment depending on its context (aka AstFragmentKind - item, or -expression, or pattern etc.) Invocation/InvocationKind - a structure describing -a macro call, these structures are collected by the expansion infra -(InvocationCollector), queued, resolved, expanded when resolved, etc. - -Primary algorithms / actions: TODO - -mark-i-m: Very useful :+1: - -mark-i-m: @Vadim Petrochenkov Zulip doesn't have an indication of typing, so -I'm not sure if you are waiting for me or not - -Vadim Petrochenkov: The TODO part should be about how a crate transitions from -the state "macros exist as written in source" to "all macros are expanded", but -I didn't write it yet. - -Vadim Petrochenkov: (That should probably better happen off-line.) - -Vadim Petrochenkov: Now, if you have any questions? - -mark-i-m: Thanks :) - -mark-i-m: /me is still reading :P - -mark-i-m: Ok - -mark-i-m: So I guess my first question is about hygiene, since that remains the -most mysterious to me... My understanding is that the parser outputs AST nodes, -where each node has a Span - -mark-i-m: In the absence of macros and desugaring, what does the syntax context -of an AST node look like? - -mark-i-m: @Vadim Petrochenkov - -Vadim Petrochenkov: Not each node, but many of them. When a node is not -macro-expanded, its context is 0. - -Vadim Petrochenkov: aka SyntaxContext::empty() - -Vadim Petrochenkov: it's a chain that consists of one expansion - expansion 0 -aka ExpnId::root. - -mark-i-m: Do all expansions start at root? - -Vadim Petrochenkov: Also, SyntaxContext:empty() is its own father. - -mark-i-m: Is this actually stored somewhere or is it a logical value? + Vadim Petrochenkov: All expansion hyerarchies (there are several of them) start at ExpnId::root. @@ -368,12 +301,8 @@ expn_id == 0. Vadim Petrochenkov: I don't think anyone looks into them much though. -mark-i-m: Ok - Vadim Petrochenkov: Speaking of multiple hierarchies... -mark-i-m: Go ahead :) - Vadim Petrochenkov: One is parent (expn_id1) -> parent(expn_id2) -> ... Vadim Petrochenkov: This is the order in which macros are expanded. @@ -429,8 +358,6 @@ Sorry, what is outer_expns? Vadim Petrochenkov: SyntaxContextData::outer_expn -mark-i-m: Thanks :) Please continue - Vadim Petrochenkov: ...which means a token produced by a built-in macro (which is defined in the root effectively). @@ -470,8 +397,6 @@ mark-i-m: I see, but this pattern is only used for built-ins, right? Vadim Petrochenkov: And also all stable proc macros, see the comments above. -mark-i-m: Got it - Vadim Petrochenkov: The third hierarchy is call-site hierarchy. Vadim Petrochenkov: If foo!(bar!(ident)) expands into ident @@ -507,30 +432,6 @@ generally.) Vadim Petrochenkov: Yes. -mark-i-m: Got it :) - -mark-i-m: It looks like we have ~5 minutes left. This has been very helpful -already, but I also have more questions. Shall we try to schedule another -meeting in the future? - -Vadim Petrochenkov: Sure, why not. - -Vadim Petrochenkov: A thread for offline questions-answers would be good too. - - mark-i-m: - - A thread for offline questions-answers would be good too. - -I don't mind using this thread, since it already has a lot of info in it. We -also plan to summarize the info from this thread into the rustc-dev-guide. - - Sure, why not. - -Unfortunately, I'm unavailable for a few weeks. Would August 21-ish work for -you (and @WG-learning )? - -mark-i-m: @Vadim Petrochenkov Thanks very much for your time and knowledge! - mark-i-m: One last question: are there more hierarchies? Vadim Petrochenkov: Not that I know of. Three + the context transplantation @@ -539,37 +440,8 @@ hack is already more complex than I'd like. mark-i-m: Yes, one wonders what it would be like if one also had to think about eager expansion... -Santiago Pastorino: sorry but I couldn't follow that much today, will read it -when I have some time later - -Santiago Pastorino: btw https://github.com/rust-lang/rustc-dev-guide/issues/398 - -mark-i-m: @Vadim Petrochenkov Would 7pm UTC on August 21 work for a followup? - -Vadim Petrochenkov: Tentatively yes. - -mark-i-m: @Vadim Petrochenkov @WG-learning Does this still work for everyone? - -Vadim Petrochenkov: August 21 is still ok. - -mark-i-m: @WG-learning @Vadim Petrochenkov We will start in ~30min - -Vadim Petrochenkov: Oh. Thanks for the reminder, I forgot about this entirely. - -mark-i-m: Hello! - -Vadim Petrochenkov: (I'll be here in a couple of minutes.) - -Vadim Petrochenkov: Ok, I'm here. - -mark-i-m: Hi :) - -Vadim Petrochenkov: Hi. - mark-i-m: so last time, we talked about the 3 context heirarchies -Vadim Petrochenkov: Right. - mark-i-m: Was there anything you wanted to add to that? If not, I think it would be good to get a big-picture... Given some piece of rust code, how do we get to the point where things are expanded and hygiene context is computed? @@ -728,8 +600,6 @@ imports in that module. For macro and import names this happens during expansions and integrations. -mark-i-m: Makes sense - Vadim Petrochenkov: For all other names we certainly know whether a name is resolved successfully or not on the first attempt, because no new names can appear. @@ -791,21 +661,4 @@ m!(foo); Vadim Petrochenkov: foo has context ROOT -> id(n) and bar has context ROOT -> id(m) -> id(n) after all the expansions. -mark-i-m: Cool :) - -mark-i-m: It looks like we are out of time - -mark-i-m: Is there anything you wanted to add? - -mark-i-m: We can schedule another meeting if you would like - -Vadim Petrochenkov: Yep, 23.06 already. No, I think this is an ok point to -stop. - -mark-i-m: :+1: - -mark-i-m: Thanks @Vadim Petrochenkov ! This was very helpful - -Vadim Petrochenkov: Yeah, we can schedule another one. So far it's been like 1 -hour of meetings per month? Certainly not a big burden. ``` From f652e0d9b4091ec787ce13b996fb8f25619616ae Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 25 Apr 2020 15:18:15 -0500 Subject: [PATCH 03/22] more notetaking --- src/macro-expansion.md | 148 ++++++++++------------------------------- 1 file changed, 34 insertions(+), 114 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 002e077d8..fc42ae864 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -276,112 +276,57 @@ Primary structures: TODO: how a crate transitions from the state "macros exist as written in source" to "all macros are expanded" Expansion Heirarchies and Syntax Context -- Many AST nodes have some sort of syntax context, especially nodes from macros. The context consists of a chain of expansions leading to `ExpnId::root`. A non-macro-expanded node has syntax context 0 (`SyntaxContext::empty()`) which represents just the root node. +- Many AST nodes have some sort of syntax context, especially nodes from macros. +- When we ask what is the syntax context of a node, the answer actually differs by what we are trying to do. Thus, we don't just keep track of a single context. There are in fact 3 different types of context used for different things. +- Each type of context is tracked by an "expansion heirarchy". As we expand macros, new macro calls or macro definitions may be generated, leading to some nesting. This nesting is where the heirarchies come from. Each heirarchy tracks some different aspect, though, as we will see. - There are 3 expansion heirarchies - They all start at ExpnId::root, which is its own parent + - The context of a node consists of a chain of expansions leading to `ExpnId::root`. A non-macro-expanded node has syntax context 0 (`SyntaxContext::empty()`) which represents just the root node. + - There are vectors in `HygieneData` that contain expansion info. + - There are entries here for both `SyntaxContext::empty()` and `ExpnId::root`, but they aren't used much. + 1. Tracks expansion order: when a macro invocation is in the output of another macro. + ... + expn_id2 + expn_id1 + InternalExpnData::parent is the child->parent link. That is the expn_id1 points to expn_id2 points to ... + Ex: + macro_rules! foo { () => { println!(); } } + fn main() { foo!(); } + // Then AST nodes that are finally generated would have parent(expn_id_println) -> parent(expn_id_foo), right? + 2. Tracks macro definitions: when we are expanding one macro another macro definition is revealed in its output. + ... + SyntaxContext2 + SyntaxContext1 + SyntaxContextData::parent is the child->parent link here. + SyntaxContext is the whole chain in this hierarchy, and SyntaxContextData::outer_expns are individual elements in the chain. -# Discussion about hygiene - - -```txt - - - -Vadim Petrochenkov: All expansion hyerarchies (there are several of them) start -at ExpnId::root. - -Vadim Petrochenkov: Vectors in HygieneData has entries for both ctxt == 0 and -expn_id == 0. - -Vadim Petrochenkov: I don't think anyone looks into them much though. - -Vadim Petrochenkov: Speaking of multiple hierarchies... - -Vadim Petrochenkov: One is parent (expn_id1) -> parent(expn_id2) -> ... - -Vadim Petrochenkov: This is the order in which macros are expanded. - -Vadim Petrochenkov: Well. - -Vadim Petrochenkov: When we are expanding one macro another macro is revealed -in its output. - -Vadim Petrochenkov: That's the parent-child relation in this hierarchy. - -Vadim Petrochenkov: InternalExpnData::parent is the child->parent link. - -mark-i-m: So in the above chain expn_id1 is the child? - -Vadim Petrochenkov: Yes. - -Vadim Petrochenkov: The second one is parent (SyntaxContext1) -> -parent(SyntaxContext2) -> ... - -Vadim Petrochenkov: This is about nested macro definitions. When we are -expanding one macro another macro definition is revealed in its output. -Vadim Petrochenkov: SyntaxContextData::parent is the child->parent link here. -Vadim Petrochenkov: So, SyntaxContext is the whole chain in this hierarchy, and -outer_expns are individual elements in the chain. - -mark-i-m: So for example, suppose I have the following: - -macro_rules! foo { () => { println!(); } } - -fn main() { foo!(); } - -Then AST nodes that are finally generated would have parent(expn_id_println) -> -parent(expn_id_foo), right? - -Vadim Petrochenkov: Pretty common construction (at least it was, before -refactorings) is SyntaxContext::empty().apply_mark(expn_id), which means... - - Vadim Petrochenkov: - - Then AST nodes that are finally generated would have - parent(expn_id_println) -> parent(expn_id_foo), right? - -Yes. +# Discussion about hygiene - mark-i-m: - and outer_expns are individual elements in the chain. +```txt -Sorry, what is outer_expns? -Vadim Petrochenkov: SyntaxContextData::outer_expn +Vadim Petrochenkov: Pretty common construction (at least it was, before refactorings) is SyntaxContext::empty().apply_mark(expn_id), which means a token produced by a built-in macro (which is defined in the root effectively). -Vadim Petrochenkov: ...which means a token produced by a built-in macro (which -is defined in the root effectively). +Vadim Petrochenkov: Or a stable proc macro, which are always considered to be defined in the root because they are always cross-crate, and we don't have the cross-crate hygiene implemented, ha-ha. mark-i-m: Where does the expn_id come from? -Vadim Petrochenkov: Or a stable proc macro, which are always considered to be -defined in the root because they are always cross-crate, and we don't have the -cross-crate hygiene implemented, ha-ha. - - Vadim Petrochenkov: - - Where does the expn_id come from? - Vadim Petrochenkov: ID of the built-in macro call like line!(). -Vadim Petrochenkov: Assigned continuously from 0 to N as soon as we discover -new macro calls. +Vadim Petrochenkov: Assigned continuously from 0 to N as soon as we discover new macro calls. -mark-i-m: Sorry, I didn't quite understand. Do you mean that only built-in -macros receive continuous IDs? +mark-i-m: Sorry, I didn't quite understand. Do you mean that only built-in macros receive continuous IDs? -Vadim Petrochenkov: So, the second hierarchy has a catch - the context -transplantation hack - -https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. +Vadim Petrochenkov: So, the second hierarchy has a catch - the context transplantation hack - https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. Vadim Petrochenkov: @@ -389,9 +334,7 @@ https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. Vadim Petrochenkov: No, all macro calls receive ID. -Vadim Petrochenkov: Built-ins have the typical pattern -SyntaxContext::empty().apply_mark(expn_id) for syntax contexts produced by -them. +Vadim Petrochenkov: Built-ins have the typical pattern SyntaxContext::empty().apply_mark(expn_id) for syntax contexts produced by them. mark-i-m: I see, but this pattern is only used for built-ins, right? @@ -407,23 +350,17 @@ Vadim Petrochenkov: but hierarchy 3 is root -> ident Vadim Petrochenkov: ExpnInfo::call_site is the child-parent link in this case. -mark-i-m: When we expand, do we expand foo first or bar? Why is there a -hierarchy 1 here? Is that foo expands first and it expands to something that -contains bar!(ident)? +mark-i-m: When we expand, do we expand foo first or bar? Why is there a hierarchy 1 here? Is that foo expands first and it expands to something that contains bar!(ident)? Vadim Petrochenkov: Ah, yes, let's assume both foo and bar are identity macros. -Vadim Petrochenkov: Then foo!(bar!(ident)) -> expand -> bar!(ident) -> expand --> ident +Vadim Petrochenkov: Then foo!(bar!(ident)) -> expand -> bar!(ident) -> expand -> ident -Vadim Petrochenkov: If bar were expanded first, that would be eager expansion - -https://github.com/rust-lang/rfcs/pull/2320. +Vadim Petrochenkov: If bar were expanded first, that would be eager expansion - https://github.com/rust-lang/rfcs/pull/2320. -mark-i-m: And after we expand only foo! presumably whatever intermediate state -has heirarchy 1 of root->foo->(bar_ident), right? +mark-i-m: And after we expand only foo! presumably whatever intermediate state has heirarchy 1 of root->foo->(bar_ident), right? -Vadim Petrochenkov: (We have it hacked into some built-in macros, but not -generally.) +Vadim Petrochenkov: (We have it hacked into some built-in macros, but not generally.) Vadim Petrochenkov: @@ -432,23 +369,6 @@ generally.) Vadim Petrochenkov: Yes. -mark-i-m: One last question: are there more hierarchies? - -Vadim Petrochenkov: Not that I know of. Three + the context transplantation -hack is already more complex than I'd like. - -mark-i-m: Yes, one wonders what it would be like if one also had to think about -eager expansion... - -mark-i-m: so last time, we talked about the 3 context heirarchies - -mark-i-m: Was there anything you wanted to add to that? If not, I think it -would be good to get a big-picture... Given some piece of rust code, how do we -get to the point where things are expanded and hygiene context is computed? - -mark-i-m: (I'm assuming that hygiene info is computed as we expand stuff, since -I don't think you can discover it beforehand) - Vadim Petrochenkov: Ok, let's move from hygiene to expansion. Vadim Petrochenkov: Especially given that I don't remember the specific hygiene From 76379c37340ed464db5985748b637f9de1467399 Mon Sep 17 00:00:00 2001 From: mark Date: Wed, 29 Apr 2020 21:07:21 -0500 Subject: [PATCH 04/22] finish going through discussion --- src/macro-expansion.md | 366 ++++++++++------------------------------- 1 file changed, 89 insertions(+), 277 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index fc42ae864..fc04c6f8d 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -249,38 +249,46 @@ Where to find the code: - librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for "integrating the results back into AST" basicallly, "placeholder" is a temporary AST node replaced with macro expansion result nodes - librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably be moved into librustc_builtin_macros these days - librustc_ast/ext/proc_macro.rs + librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the stable proc_macro library, converting tokens and token streams between the two representations and sending them through C ABI -- librustc_ast/ext/tt - implementation of macro_rules, turns macro_rules DSL into something with signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, @mark-i-m knows more about this +- librustc_ast/ext/tt - implementation of macro_rules, turns macro_rules DSL into something with signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, @mark-i-m knows more about this - librustc_resolve/macros.rs - resolving macro paths, validating those resolutions, reporting various "not found"/"found, but it's unstable"/"expected x, found y" errors - librustc_middle/hir/map/def_collector.rs + librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly expanded from a macro into various parent/child structures like module hierarchy or "definition paths" Primary structures: -- HygieneData - global piece of data containing hygiene and expansion info that can be accessed from any Ident without any context -- ExpnId - ID of a macro call or desugaring (and also expansion of that call/desugaring, depending on context) +- HygieneData - global piece of data containing hygiene and expansion info that can be accessed from any Ident without any context +- ExpnId - ID of a macro call or desugaring (and also expansion of that call/desugaring, depending on context) - ExpnInfo/InternalExpnData - a subset of properties from both macro definition and macro call available through global data - SyntaxContext - ID of a chain of nested macro definitions (identified by ExpnIds) - SyntaxContextData - data associated with the given SyntaxContext, mostly a cache for results of filtering that chain in different ways -- Span - a code location + SyntaxContext +- Span - a code location + SyntaxContext - Ident - interned string (Symbol) + Span, i.e. a string with attached hygiene data -- TokenStream - a collection of TokenTrees +- TokenStream - a collection of TokenTrees - TokenTree - a token (punctuation, identifier, or literal) or a delimited group (anything inside ()/[]/{}) - SyntaxExtension - a lowered macro representation, contains its expander function transforming a tokenstream or AST into tokenstream or AST + some additional data like stability, or a list of unstable features allowed inside the macro. - SyntaxExtensionKind - expander functions may have several different signatures (take one token stream, or two, or a piece of AST, etc), this is an enum that lists them -- ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing the expander signatures (TODO: change and rename the signatures into something more consistent) +- ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing the expander signatures (TODO: change and rename the signatures into something more consistent) - Resolver - a trait used to break crate dependencies (so resolver services can be used in librustc_ast, despite librustc_resolve and pretty much everything else depending on librustc_ast) - ExtCtxt/ExpansionData - various intermediate data kept and used by expansion infra in the process of its work - AstFragment - a piece of AST that can be produced by a macro (may include multiple homogeneous AST nodes, like e.g. a list of items) -- Annotatable - a piece of AST that can be an attribute target, almost same thing as AstFragment except for types and patterns that can be produced by macros but cannot be annotated with attributes (TODO: Merge into AstFragment) +- Annotatable - a piece of AST that can be an attribute target, almost same thing as AstFragment except for types and patterns that can be produced by macros but cannot be annotated with attributes (TODO: Merge into AstFragment) - MacResult - a "polymorphic" AST fragment, something that can turn into a different AstFragment depending on its context (aka AstFragmentKind - item, or expression, or pattern etc.) -- Invocation/InvocationKind - a structure describing a macro call, these structures are collected by the expansion infra (InvocationCollector), queued, resolved, expanded when resolved, etc. +- Invocation/InvocationKind - a structure describing a macro call, these structures are collected by the expansion infra (InvocationCollector), queued, resolved, expanded when resolved, etc. TODO: how a crate transitions from the state "macros exist as written in source" to "all macros are expanded" -Expansion Heirarchies and Syntax Context +Hygiene and Expansion Heirarchies + +- Expansion is lazy. We work from the outside of a macro invocation inward. + - Ex: foo!(bar!(ident)) -> expand -> bar!(ident) -> expand -> ident + - Eager expansion: https://github.com/rust-lang/rfcs/pull/2320. + - Seems complicated to implemented + - We have it hacked into some built-in macros, but not generally. - Many AST nodes have some sort of syntax context, especially nodes from macros. - When we ask what is the syntax context of a node, the answer actually differs by what we are trying to do. Thus, we don't just keep track of a single context. There are in fact 3 different types of context used for different things. - Each type of context is tracked by an "expansion heirarchy". As we expand macros, new macro calls or macro definitions may be generated, leading to some nesting. This nesting is where the heirarchies come from. Each heirarchy tracks some different aspect, though, as we will see. - There are 3 expansion heirarchies - - They all start at ExpnId::root, which is its own parent + - All macros receive an integer ID assigned continuously starting from 0 as we discover new macro calls + - This is used as the `expn_id` where needed. + - All heirarchies start at ExpnId::root, which is its own parent - The context of a node consists of a chain of expansions leading to `ExpnId::root`. A non-macro-expanded node has syntax context 0 (`SyntaxContext::empty()`) which represents just the root node. - There are vectors in `HygieneData` that contain expansion info. - There are entries here for both `SyntaxContext::empty()` and `ExpnId::root`, but they aren't used much. @@ -304,281 +312,85 @@ Expansion Heirarchies and Syntax Context SyntaxContextData::parent is the child->parent link here. SyntaxContext is the whole chain in this hierarchy, and SyntaxContextData::outer_expns are individual elements in the chain. + - For built-in macros (e.g. `line!()`) or stable proc macros: tokens produced by the macro are given the context `SyntaxContext::empty().apply_mark(expn_id)` + - Such macros are considered to have been defined at the root. + - For proc macros this is because they are always cross-crate and we don't have cross-crate hygiene implemented. + The second hierarchy has the context transplantation hack. See https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. + If the token had context X before being produced by a macro then after being produced by the macro it has context X -> macro_id. + Ex: + ```rust + macro m() { ident } + ``` -# Discussion about hygiene - - -```txt - - -Vadim Petrochenkov: Pretty common construction (at least it was, before refactorings) is SyntaxContext::empty().apply_mark(expn_id), which means a token produced by a built-in macro (which is defined in the root effectively). - -Vadim Petrochenkov: Or a stable proc macro, which are always considered to be defined in the root because they are always cross-crate, and we don't have the cross-crate hygiene implemented, ha-ha. - -mark-i-m: Where does the expn_id come from? - -Vadim Petrochenkov: ID of the built-in macro call like line!(). - -Vadim Petrochenkov: Assigned continuously from 0 to N as soon as we discover new macro calls. - -mark-i-m: Sorry, I didn't quite understand. Do you mean that only built-in macros receive continuous IDs? - -Vadim Petrochenkov: So, the second hierarchy has a catch - the context transplantation hack - https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. - - Vadim Petrochenkov: - - Do you mean that only built-in macros receive continuous IDs? - -Vadim Petrochenkov: No, all macro calls receive ID. - -Vadim Petrochenkov: Built-ins have the typical pattern SyntaxContext::empty().apply_mark(expn_id) for syntax contexts produced by them. - -mark-i-m: I see, but this pattern is only used for built-ins, right? - -Vadim Petrochenkov: And also all stable proc macros, see the comments above. - -Vadim Petrochenkov: The third hierarchy is call-site hierarchy. - -Vadim Petrochenkov: If foo!(bar!(ident)) expands into ident - -Vadim Petrochenkov: then hierarchy 1 is root -> foo -> bar -> ident - -Vadim Petrochenkov: but hierarchy 3 is root -> ident - -Vadim Petrochenkov: ExpnInfo::call_site is the child-parent link in this case. - -mark-i-m: When we expand, do we expand foo first or bar? Why is there a hierarchy 1 here? Is that foo expands first and it expands to something that contains bar!(ident)? - -Vadim Petrochenkov: Ah, yes, let's assume both foo and bar are identity macros. - -Vadim Petrochenkov: Then foo!(bar!(ident)) -> expand -> bar!(ident) -> expand -> ident - -Vadim Petrochenkov: If bar were expanded first, that would be eager expansion - https://github.com/rust-lang/rfcs/pull/2320. - -mark-i-m: And after we expand only foo! presumably whatever intermediate state has heirarchy 1 of root->foo->(bar_ident), right? - -Vadim Petrochenkov: (We have it hacked into some built-in macros, but not generally.) - - Vadim Petrochenkov: - - And after we expand only foo! presumably whatever intermediate state has - heirarchy 1 of root->foo->(bar_ident), right? - -Vadim Petrochenkov: Yes. - -Vadim Petrochenkov: Ok, let's move from hygiene to expansion. - -Vadim Petrochenkov: Especially given that I don't remember the specific hygiene -algorithms like adjust in detail. - - Vadim Petrochenkov: - - Given some piece of rust code, how do we get to the point where things are - expanded - -So, first of all, the "some piece of rust code" is the whole crate. - -mark-i-m: Just to confirm, the algorithms are well-encapsulated, right? Like a -function or a struct as opposed to a bunch of conventions distributed across -the codebase? - -Vadim Petrochenkov: We run fully_expand_fragment in it. - - Vadim Petrochenkov: - - Just to confirm, the algorithms are well-encapsulated, right? - -Yes, the algorithmic parts are entirely inside hygiene.rs. - -Vadim Petrochenkov: Ok, some are in fn resolve_crate_root, but those are hacks. - -Vadim Petrochenkov: (Continuing about expansion.) If fully_expand_fragment is -run not on a whole crate, it means that we are performing eager expansion. - -Vadim Petrochenkov: Eager expansion is done for arguments of some built-in -macros that expect literals. - -Vadim Petrochenkov: It generally performs a subset of actions performed by the -non-eager expansion. - -Vadim Petrochenkov: So, I'll talk about non-eager expansion for now. - -mark-i-m: Eager expansion is not exposed as a language feature, right? i.e. it -is not possible for me to write an eager macro? - -Vadim Petrochenkov: -https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 (vvv The -link is explained below vvv ) - - Vadim Petrochenkov: - - Eager expansion is not exposed as a language feature, right? i.e. it is not - possible for me to write an eager macro? - -Yes, it's entirely an ability of some built-in macros. - -Vadim Petrochenkov: Not exposed for general use. - -Vadim Petrochenkov: fully_expand_fragment works in iterations. - -Vadim Petrochenkov: Iterations looks roughly like this: -- Resolve imports in our partially built crate as much as possible. -- Collect as many macro invocations as possible from our partially built crate - (fn-like, attributes, derives) from the crate and add them to the queue. - - Vadim Petrochenkov: Take a macro from the queue, and attempt to resolve it. - - Vadim Petrochenkov: If it's resolved - run its expander function that - consumes tokens or AST and produces tokens or AST (depending on the macro - kind). - - Vadim Petrochenkov: (If it's not resolved, then put it back into the - queue.) - -Vadim Petrochenkov: ^^^ That's where we fill in the hygiene data associated -with ExpnIds. - -mark-i-m: When we put it back in the queue? - -mark-i-m: or do you mean the collect step in general? - -Vadim Petrochenkov: Once we resolved the macro call to the macro definition we -know everything about the macro and can call set_expn_data to fill in its -properties in the global data. - -Vadim Petrochenkov: I mean, immediately after successful resolution. - -Vadim Petrochenkov: That's the first part of hygiene data, the second one is -associated with SyntaxContext rather than with ExpnId, it's filled in later -during expansion. - -Vadim Petrochenkov: So, after we run the macro's expander function and got a -piece of AST (or got tokens and parsed them into a piece of AST) we need to -integrate that piece of AST into the big existing partially built AST. - -Vadim Petrochenkov: This integration is a really important step where the next -things happen: -- NodeIds are assigned. - - Vadim Petrochenkov: "def paths"s and their IDs (DefIds) are created - - Vadim Petrochenkov: Names are put into modules from the resolver point of - view. - -Vadim Petrochenkov: So, we are basically turning some vague token-like mass -into proper set in stone hierarhical AST and side tables. - -Vadim Petrochenkov: Where exactly this happens - NodeIds are assigned by -InvocationCollector (which also collects new macro calls from this new AST -piece and adds them to the queue), DefIds are created by DefCollector, and -modules are filled by BuildReducedGraphVisitor. - -Vadim Petrochenkov: These three passes run one after another on every AST -fragment freshly expanded from a macro. - -Vadim Petrochenkov: After expanding a single macro and integrating its output -we again try to resolve all imports in the crate, and then return to the big -queue processing loop and pick up the next macro. - -Vadim Petrochenkov: Repeat until there's no more macros. Vadim Petrochenkov: - -mark-i-m: The integration step is where we would get parser errors too right? - -mark-i-m: Also, when do we know definitively that resolution has failed for -particular ident? - - Vadim Petrochenkov: - - The integration step is where we would get parser errors too right? - -Yes, if the macro produced tokens (rather than AST directly) and we had to -parse them. - - Vadim Petrochenkov: - - when do we know definitively that resolution has failed for particular - ident? - -So, ident is looked up in a number of scopes during resolution. From closest -like the current block or module, to far away like preludes or built-in types. - -Vadim Petrochenkov: If lookup is certainly failed in all of the scopes, then -it's certainly failed. - -mark-i-m: This is after all expansions and integrations are done, right? - -Vadim Petrochenkov: "Certainly" is determined differently for different scopes, -e.g. for a module scope it means no unexpanded macros and no unresolved glob -imports in that module. - - Vadim Petrochenkov: - - This is after all expansions and integrations are done, right? - -For macro and import names this happens during expansions and integrations. - -Vadim Petrochenkov: For all other names we certainly know whether a name is -resolved successfully or not on the first attempt, because no new names can -appear. - -Vadim Petrochenkov: (They are resolved in a later pass, see -librustc_resolve/late.rs.) - -mark-i-m: And if at the end of the iteration, there are still things in the -queue that can't be resolve, this represents an error, right? - -mark-i-m: i.e. an undefined macro? - -Vadim Petrochenkov: Yes, if we make no progress during an iteration, then we -are stuck and that state represent an error. - -Vadim Petrochenkov: We attempt to recover though, using dummies expanding into -nothing or ExprKind::Err or something like that for unresolved macros. - -mark-i-m: This is for the purposes of diagnostics, though, right? - -Vadim Petrochenkov: But if we are going through recovery, then compilation must -result in an error anyway. - -Vadim Petrochenkov: Yes, that's for diagnostics, without recovery we would -stuck at the first unresolved macro or import. Vadim Petrochenkov: - -So, about the SyntaxContext hygiene... - -Vadim Petrochenkov: New syntax contexts are created during macro expansion. - -Vadim Petrochenkov: If the token had context X before being produced by a -macro, e.g. here ident has context SyntaxContext::root(): Vadim Petrochenkov: - -macro m() { ident } - -Vadim Petrochenkov: , then after being produced by the macro it has context X --> macro_id. - -Vadim Petrochenkov: I.e. our ident has context ROOT -> id(m) after it's -produced by m. - -Vadim Petrochenkov: The "chaining operator" -> is apply_mark in compiler code. -Vadim Petrochenkov: - -macro m() { macro n() { ident } } + Here `ident` originally has context SyntaxContext::root(). `ident` has context ROOT -> id(m) after it's produced by m. + The "chaining operator" is `apply_mark` in compiler code. -Vadim Petrochenkov: In this example the ident has context ROOT originally, then -ROOT -> id(m), then ROOT -> id(m) -> id(n). + Ex: -Vadim Petrochenkov: Note that these chains are not entirely determined by their -last element, in other words ExpnId is not isomorphic to SyntaxCtxt. + ```rust + macro m() { macro n() { ident } } + ``` + In this example the ident has context ROOT originally, then ROOT -> id(m), then ROOT -> id(m) -> id(n). -Vadim Petrochenkov: Couterexample: Vadim Petrochenkov: + Note that these chains are not entirely determined by their last element, in other words ExpnId is not isomorphic to SyntaxCtxt. -macro m($i: ident) { macro n() { ($i, bar) } } + Ex: + ```rust + macro m($i: ident) { macro n() { ($i, bar) } } -m!(foo); + m!(foo); + ``` -Vadim Petrochenkov: foo has context ROOT -> id(n) and bar has context ROOT -> -id(m) -> id(n) after all the expansions. + After all expansions, foo has context ROOT -> id(n) and bar has context ROOT -> id(m) -> id(n) -``` + 3. Call-site: tracks the location of the macro invocation. + Ex: + If foo!(bar!(ident)) expands into ident + then hierarchy 1 is root -> foo -> bar -> ident + but hierarchy 3 is root -> ident + + ExpnInfo::call_site is the child-parent link in this case. + +- Hygiene-related algorithms are entirely in hygiene.rs + - Some hacks in `resolve_crate_root`, though. + +Expansion +- Expansion happens over a whole crate at once. +- We run `fully_expand_fragment` on the crate + - If `fully_expand_fragment` is run not on a whole crate, it means that we are performing eager expansion. + - We do this for some built-ins that expect literals (not exposed to users). + - It performs a subset of actions performed by non-eager expansion, so the discussion below focuses on eager expansion. + - Original description here: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 + - Algorithm: `fully_expand_fragment` works in iterations. We repeat until there are no unresolved macros left. + - Resolve imports in our partially built crate as much as possible. + - (link to name-resolution chapter) names resolved from "closer" scopes (e.g. current block) to further ones (e.g. prelude) + - A resolution fails differently for different scopes, e.g. for a module scope it means no unexpanded macros and no unresolved glob imports in that module. + - Collect as many macro invocations as possible from our partially built crate + (fn-like, attributes, derives) from the crate and add them to the queue. + - Take a macro from the queue, and attempt to resolve it. + - If it's resolved - run its expander function that consumes tokens or AST and produces tokens or AST (depending on the macro kind). (If it's not resolved, then put it back into the queue.) + - At this point, we know everything about the macro itself and can call `set_expn_data` to fill in its properties in the global data -- that is the hygiene data associated with `ExpnId`. + - The macro's expander function returns a piece of AST (or tokens). We need to integrate that piece of AST into the big existing partially built AST. + - If the macro produces tokens (e.g. a proc macro), we will have to parse into an AST, which may produce parse errors. + - During expansion, we create `SyntaxContext`s (heirarchy 2). + - This is essentially where the "token-like mass" becomes a proper set-in-stone AST with side-tables + - These three passes happen one after another on every AST fragment freshly expanded from a macro + - `NodeId`s are assigned by `InvocationCollector` + - also collects new macro calls from this new AST piece and adds them to the queue + - def_paths are created and `DefId`s are assigned to them by `DefCollector` + - `Name`s are put into modules (from the resolver's point of view) by `BuildReducedGraphVisitor` + - After expanding a single macro and integrating its output continue to the next iteration of `fully_expand_fragment`. + - If we make no progress in an iteration, then we have reached a compilation error (e.g. an undefined macro). + + - We attempt to recover from failures (unresolved macros or imports) for the sake of diagnostics + - recovery can't cause compilation to suceed. We know that it will fail at this point. + - we expand errors into `ExprKind::Err` or something like that for unresolved macros + - this allows compilation to continue past the first error so that we can report more errors at a time + +Relationship to name resolution +- name resolution is done for macro and import names during expansion and integration into the AST, as discussed above +- For all other names we certainly know whether a name is resolved successfully or not on the first attempt, because no new names can appear, due to hygiene + - They are resolved in a later pass, see `librustc_resolve/late.rs` From 4543d0819b7d780d8044af3e63df329ee9d51c50 Mon Sep 17 00:00:00 2001 From: mark Date: Wed, 29 Apr 2020 21:13:35 -0500 Subject: [PATCH 05/22] get rid of old todo --- src/macro-expansion.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index fc04c6f8d..b0f23096f 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -273,8 +273,6 @@ Primary structures: - MacResult - a "polymorphic" AST fragment, something that can turn into a different AstFragment depending on its context (aka AstFragmentKind - item, or expression, or pattern etc.) - Invocation/InvocationKind - a structure describing a macro call, these structures are collected by the expansion infra (InvocationCollector), queued, resolved, expanded when resolved, etc. -TODO: how a crate transitions from the state "macros exist as written in source" to "all macros are expanded" - Hygiene and Expansion Heirarchies - Expansion is lazy. We work from the outside of a macro invocation inward. From ab746a7f93b688ad7207ed7d77d6546b9b16d9b0 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 11:28:44 -0500 Subject: [PATCH 06/22] add a bit to part 3 intro --- src/part-3-intro.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/part-3-intro.md b/src/part-3-intro.md index a1ec3ca90..2af8cce23 100644 --- a/src/part-3-intro.md +++ b/src/part-3-intro.md @@ -2,7 +2,9 @@ This part describes the process of taking raw source code from the user and transforming it into various forms that the compiler can work with easily. -These are called intermediate representations. +These are called _intermediate representations (IRs)_. This process starts with compiler understanding what the user has asked for: parsing the command line arguments given and determining what it is to compile. +After that, the compiler transforms the user input into a series of IRs that +look progressively less like what the user wrote. From 1c1110239aab54d1adbd93e1a0860381162bc619 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 11:33:05 -0500 Subject: [PATCH 07/22] add a bit to syntax intro --- src/syntax-intro.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/syntax-intro.md b/src/syntax-intro.md index dd7e2d735..43ef44577 100644 --- a/src/syntax-intro.md +++ b/src/syntax-intro.md @@ -6,3 +6,8 @@ out that doing even this involves a lot of work, including lexing, parsing, macro expansion, name resolution, conditional compilation, feature-gate checking, and validation of the AST. In this chapter, we take a look at all of these steps. + +Notably, there isn't always a clean ordering between these tasks. For example, +macro expansion relies on name resolution to resolve the names of macros and +imports. And parsing requires macro expansion, which in turn may require +parsing the output of the macro. From 1f8d6e0009400b2f60902b8e8191405490a39682 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 11:33:37 -0500 Subject: [PATCH 08/22] reorder some chapters --- src/SUMMARY.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/SUMMARY.md b/src/SUMMARY.md index fe040d9e2..6e0c71735 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -51,10 +51,10 @@ - [Ex: Type checking through `rustc_interface`](./rustc-driver-interacting-with-the-ast.md) - [Syntax and the AST](./syntax-intro.md) - [Lexing and Parsing](./the-parser.md) - - [`#[test]` Implementation](./test-implementation.md) - - [Panic Implementation](./panic-implementation.md) - [Macro expansion](./macro-expansion.md) - [Name resolution](./name-resolution.md) + - [`#[test]` Implementation](./test-implementation.md) + - [Panic Implementation](./panic-implementation.md) - [AST Validation](./ast-validation.md) - [Feature Gate Checking](./feature-gate-ck.md) - [The HIR (High-level IR)](./hir.md) From 99c44c553a89c6b83e30c3d73433c37f4495fca7 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 11:36:47 -0500 Subject: [PATCH 09/22] add note about macros in parser chapter --- src/the-parser.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/the-parser.md b/src/the-parser.md index c84ac4ea2..c0f2a071b 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -38,6 +38,11 @@ To minimise the amount of copying that is done, both the `StringReader` and `Parser` have lifetimes which bind them to the parent `ParseSess`. This contains all the information needed while parsing, as well as the `SourceMap` itself. +Note that while parsing, we may encounter macro definitions or invocations. We +set these aside to be expanded (see [this chapter](./macro-expansion.md)). +Expansion may itself require parsing the output of the macro, which may reveal +more macros to be expanded, and so on. + ## More on Lexical Analysis Code for lexical analysis is split between two crates: From 60ce2bf8cba225d8b33ba4c922e34e262cb53ef4 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 11:50:37 -0500 Subject: [PATCH 10/22] reorganize the macro expansion chapter --- src/macro-expansion.md | 383 +++++++++++++++++++++-------------------- 1 file changed, 201 insertions(+), 182 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index b0f23096f..27d9d45d1 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -3,13 +3,205 @@ > `librustc_ast`, `librustc_expand`, and `librustc_builtin_macros` are all undergoing > refactoring, so some of the links in this chapter may be broken. -Rust has a very powerful macro system. There are two major types of macros: +Rust has a very powerful macro system. In the previous chapter, we saw how the +parser sets aside macros to be expanded. This chapter is about the process of +expanding those macros iteratively until we have a complete AST for our crate +with no unexpanded macros (or a compile error). + +First, we will discuss the algorithm that expands and integrates macro output +into ASTs. Next, we will take a look at how hygiene data is collected. Finally, +we will look at the specifics of expanding different types of macros. + +## Expansion and AST Integration + +TODO: expand these notes (har har)... + +- Expansion happens over a whole crate at once. +- We run `fully_expand_fragment` on the crate + - If `fully_expand_fragment` is run not on a whole crate, it means that we are performing eager expansion. + - We do this for some built-ins that expect literals (not exposed to users). + - It performs a subset of actions performed by non-eager expansion, so the discussion below focuses on eager expansion. + - Original description here: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 + - Algorithm: `fully_expand_fragment` works in iterations. We repeat until there are no unresolved macros left. + - Resolve imports in our partially built crate as much as possible. + - (link to name-resolution chapter) names resolved from "closer" scopes (e.g. current block) to further ones (e.g. prelude) + - A resolution fails differently for different scopes, e.g. for a module scope it means no unexpanded macros and no unresolved glob imports in that module. + - Collect as many macro invocations as possible from our partially built crate + (fn-like, attributes, derives) from the crate and add them to the queue. + - Take a macro from the queue, and attempt to resolve it. + - If it's resolved - run its expander function that consumes tokens or AST and produces tokens or AST (depending on the macro kind). (If it's not resolved, then put it back into the queue.) + - At this point, we know everything about the macro itself and can call `set_expn_data` to fill in its properties in the global data -- that is the hygiene data associated with `ExpnId`. + - The macro's expander function returns a piece of AST (or tokens). We need to integrate that piece of AST into the big existing partially built AST. + - If the macro produces tokens (e.g. a proc macro), we will have to parse into an AST, which may produce parse errors. + - During expansion, we create `SyntaxContext`s (heirarchy 2). + - This is essentially where the "token-like mass" becomes a proper set-in-stone AST with side-tables + - These three passes happen one after another on every AST fragment freshly expanded from a macro + - `NodeId`s are assigned by `InvocationCollector` + - also collects new macro calls from this new AST piece and adds them to the queue + - def_paths are created and `DefId`s are assigned to them by `DefCollector` + - `Name`s are put into modules (from the resolver's point of view) by `BuildReducedGraphVisitor` + - After expanding a single macro and integrating its output continue to the next iteration of `fully_expand_fragment`. + - If we make no progress in an iteration, then we have reached a compilation error (e.g. an undefined macro). + + - We attempt to recover from failures (unresolved macros or imports) for the sake of diagnostics + - recovery can't cause compilation to suceed. We know that it will fail at this point. + - we expand errors into `ExprKind::Err` or something like that for unresolved macros + - this allows compilation to continue past the first error so that we can report more errors at a time + +### Relationship to name resolution + +- name resolution is done for macro and import names during expansion and integration into the AST, as discussed above +- For all other names we certainly know whether a name is resolved successfully or not on the first attempt, because no new names can appear, due to hygiene + - They are resolved in a later pass, see `librustc_resolve/late.rs` + +## Hygiene and Heirarchies + +If you have ever used C/C++ preprocessor macros, you know that there are some +annoying and hard-to-debug gotchas! For example, consider the following C code: + +```c +#define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;}; + +// Then, somewhere else +struct Bar { + ... +}; + +DEFINE_FOO +``` + +Most people avoid writing C like this – and for good reason: it doesn't +compile. The `struct Bar` defined by the macro clashes names with the `struct +Bar` defined in the code. Consider also the following example: + +```c +#define DO_FOO(x) {\ + int y = 0;\ + foo(x, y);\ + } + +// Then elsewhere +int y = 22; +DO_FOO(y); +``` + +Do you see the problem? We wanted to generate a call `foo(22, 0)`, but instead +we got `foo(0, 0)` because the macro defined its own `y`! + +These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to +handle names defined _within a macro_. In particular, a hygienic macro system +prevents errors due to names introduced within a macro. Rust macros are hygienic +in that they do not allow one to write the sorts of bugs above. + +At a high level, hygiene within the rust compiler is accomplished by keeping +track of the context where a name is introduced and used. We can then +disambiguate names based on that context. Future iterations of the macro system +will allow greater control to the macro author to use that context. For example, +a macro author may want to introduce a new name to the context where the macro +was called. Alternately, the macro author may be defining a variable for use +only within the macro (i.e. it should not be visible outside the macro). + +This section is about how that context is tracked. + +[code_dir]: https://github.com/rust-lang/rust/tree/master/src/librustc_expand/mbe +[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser +[code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules +[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/fn.parse_tt.html +[parsing]: ./the-parser.html + +TODO: expand these notes + +- Expansion is lazy. We work from the outside of a macro invocation inward. + - Ex: foo!(bar!(ident)) -> expand -> bar!(ident) -> expand -> ident + - Eager expansion: https://github.com/rust-lang/rfcs/pull/2320. + - Seems complicated to implemented + - We have it hacked into some built-in macros, but not generally. +- Many AST nodes have some sort of syntax context, especially nodes from macros. +- When we ask what is the syntax context of a node, the answer actually differs by what we are trying to do. Thus, we don't just keep track of a single context. There are in fact 3 different types of context used for different things. +- Each type of context is tracked by an "expansion heirarchy". As we expand macros, new macro calls or macro definitions may be generated, leading to some nesting. This nesting is where the heirarchies come from. Each heirarchy tracks some different aspect, though, as we will see. +- There are 3 expansion heirarchies + - All macros receive an integer ID assigned continuously starting from 0 as we discover new macro calls + - This is used as the `expn_id` where needed. + - All heirarchies start at ExpnId::root, which is its own parent + - The context of a node consists of a chain of expansions leading to `ExpnId::root`. A non-macro-expanded node has syntax context 0 (`SyntaxContext::empty()`) which represents just the root node. + - There are vectors in `HygieneData` that contain expansion info. + - There are entries here for both `SyntaxContext::empty()` and `ExpnId::root`, but they aren't used much. + + 1. Tracks expansion order: when a macro invocation is in the output of another macro. + ... + expn_id2 + expn_id1 + InternalExpnData::parent is the child->parent link. That is the expn_id1 points to expn_id2 points to ... + + Ex: + macro_rules! foo { () => { println!(); } } + fn main() { foo!(); } + + // Then AST nodes that are finally generated would have parent(expn_id_println) -> parent(expn_id_foo), right? + + 2. Tracks macro definitions: when we are expanding one macro another macro definition is revealed in its output. + ... + SyntaxContext2 + SyntaxContext1 + SyntaxContextData::parent is the child->parent link here. + SyntaxContext is the whole chain in this hierarchy, and SyntaxContextData::outer_expns are individual elements in the chain. + + - For built-in macros (e.g. `line!()`) or stable proc macros: tokens produced by the macro are given the context `SyntaxContext::empty().apply_mark(expn_id)` + - Such macros are considered to have been defined at the root. + - For proc macros this is because they are always cross-crate and we don't have cross-crate hygiene implemented. + + The second hierarchy has the context transplantation hack. See https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. + + If the token had context X before being produced by a macro then after being produced by the macro it has context X -> macro_id. + + Ex: + ```rust + macro m() { ident } + ``` + + Here `ident` originally has context SyntaxContext::root(). `ident` has context ROOT -> id(m) after it's produced by m. + The "chaining operator" is `apply_mark` in compiler code. + + Ex: + + ```rust + macro m() { macro n() { ident } } + ``` + In this example the ident has context ROOT originally, then ROOT -> id(m), then ROOT -> id(m) -> id(n). + + Note that these chains are not entirely determined by their last element, in other words ExpnId is not isomorphic to SyntaxCtxt. + + Ex: + ```rust + macro m($i: ident) { macro n() { ($i, bar) } } + + m!(foo); + ``` + + After all expansions, foo has context ROOT -> id(n) and bar has context ROOT -> id(m) -> id(n) + + 3. Call-site: tracks the location of the macro invocation. + Ex: + If foo!(bar!(ident)) expands into ident + then hierarchy 1 is root -> foo -> bar -> ident + but hierarchy 3 is root -> ident + + ExpnInfo::call_site is the child-parent link in this case. + +- Hygiene-related algorithms are entirely in hygiene.rs + - Some hacks in `resolve_crate_root`, though. + +## Producing Macro Output + +Above, we saw how the output of a macro is integrated into the AST for a crate, +and we also saw how th e hygiene data for a crate is generated. But how do we +actually produce the output of a macro? It depends on the type of macro. + +There are two types of macros in Rust: `macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros -("proc macros"; including custom derives). During the parsing phase, the normal +(or "proc macros"; including custom derives). During the parsing phase, the normal Rust parser will set aside the contents of macros and their invocations. Later, -before name resolution, macros are expanded using these portions of the code. -In this chapter, we will discuss MBEs, proc macros, and hygiene. Both types of -macros are expanded during parsing, but they happen in different ways. +macros are expanded using these portions of the code. ## Macros By Example @@ -65,10 +257,10 @@ called _macro expansion_, and it is the topic of this chapter. ### The MBE parser -There are two parts to macro expansion: parsing the definition and parsing the +There are two parts to MBE expansion: parsing the definition and parsing the invocations. Interestingly, both are done by the macro parser. -Basically, the macro parser is like an NFA-based regex parser. It uses an +Basically, the MBE parser is like an NFA-based regex parser. It uses an algorithm similar in spirit to the [Earley parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is defined in [`src/librustc_expand/mbe/macro_parser.rs`][code_mp]. @@ -180,63 +372,10 @@ Custom derives are a special type of proc macro. TODO: more? -## Hygiene - -If you have ever used C/C++ preprocessor macros, you know that there are some -annoying and hard-to-debug gotchas! For example, consider the following C code: - -```c -#define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;}; - -// Then, somewhere else -struct Bar { - ... -}; - -DEFINE_FOO -``` - -Most people avoid writing C like this – and for good reason: it doesn't -compile. The `struct Bar` defined by the macro clashes names with the `struct -Bar` defined in the code. Consider also the following example: - -```c -#define DO_FOO(x) {\ - int y = 0;\ - foo(x, y);\ - } - -// Then elsewhere -int y = 22; -DO_FOO(y); -``` - -Do you see the problem? We wanted to generate a call `foo(22, 0)`, but instead -we got `foo(0, 0)` because the macro defined its own `y`! - -These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to -handle names defined _within a macro_. In particular, a hygienic macro system -prevents errors due to names introduced within a macro. Rust macros are hygienic -in that they do not allow one to write the sorts of bugs above. - -At a high level, hygiene within the rust compiler is accomplished by keeping -track of the context where a name is introduced and used. We can then -disambiguate names based on that context. Future iterations of the macro system -will allow greater control to the macro author to use that context. For example, -a macro author may want to introduce a new name to the context where the macro -was called. Alternately, the macro author may be defining a variable for use -only within the macro (i.e. it should not be visible outside the macro). - -This section is about how that context is tracked. - -[code_dir]: https://github.com/rust-lang/rust/tree/master/src/librustc_expand/mbe -[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser -[code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules -[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/fn.parse_tt.html -[parsing]: ./the-parser.html - ## Notes from petrochenkov discussion +TODO: sprinkle these links around the chapter... + Where to find the code: - librustc_span/hygiene.rs - structures related to hygiene and expansion that are kept in global data (can be accessed from any Ident without any context) - librustc_span/lib.rs - some secondary methods like macro backtrace using primary methods from hygiene.rs @@ -272,123 +411,3 @@ Primary structures: - Annotatable - a piece of AST that can be an attribute target, almost same thing as AstFragment except for types and patterns that can be produced by macros but cannot be annotated with attributes (TODO: Merge into AstFragment) - MacResult - a "polymorphic" AST fragment, something that can turn into a different AstFragment depending on its context (aka AstFragmentKind - item, or expression, or pattern etc.) - Invocation/InvocationKind - a structure describing a macro call, these structures are collected by the expansion infra (InvocationCollector), queued, resolved, expanded when resolved, etc. - -Hygiene and Expansion Heirarchies - -- Expansion is lazy. We work from the outside of a macro invocation inward. - - Ex: foo!(bar!(ident)) -> expand -> bar!(ident) -> expand -> ident - - Eager expansion: https://github.com/rust-lang/rfcs/pull/2320. - - Seems complicated to implemented - - We have it hacked into some built-in macros, but not generally. -- Many AST nodes have some sort of syntax context, especially nodes from macros. -- When we ask what is the syntax context of a node, the answer actually differs by what we are trying to do. Thus, we don't just keep track of a single context. There are in fact 3 different types of context used for different things. -- Each type of context is tracked by an "expansion heirarchy". As we expand macros, new macro calls or macro definitions may be generated, leading to some nesting. This nesting is where the heirarchies come from. Each heirarchy tracks some different aspect, though, as we will see. -- There are 3 expansion heirarchies - - All macros receive an integer ID assigned continuously starting from 0 as we discover new macro calls - - This is used as the `expn_id` where needed. - - All heirarchies start at ExpnId::root, which is its own parent - - The context of a node consists of a chain of expansions leading to `ExpnId::root`. A non-macro-expanded node has syntax context 0 (`SyntaxContext::empty()`) which represents just the root node. - - There are vectors in `HygieneData` that contain expansion info. - - There are entries here for both `SyntaxContext::empty()` and `ExpnId::root`, but they aren't used much. - - 1. Tracks expansion order: when a macro invocation is in the output of another macro. - ... - expn_id2 - expn_id1 - InternalExpnData::parent is the child->parent link. That is the expn_id1 points to expn_id2 points to ... - - Ex: - macro_rules! foo { () => { println!(); } } - fn main() { foo!(); } - - // Then AST nodes that are finally generated would have parent(expn_id_println) -> parent(expn_id_foo), right? - - 2. Tracks macro definitions: when we are expanding one macro another macro definition is revealed in its output. - ... - SyntaxContext2 - SyntaxContext1 - SyntaxContextData::parent is the child->parent link here. - SyntaxContext is the whole chain in this hierarchy, and SyntaxContextData::outer_expns are individual elements in the chain. - - - For built-in macros (e.g. `line!()`) or stable proc macros: tokens produced by the macro are given the context `SyntaxContext::empty().apply_mark(expn_id)` - - Such macros are considered to have been defined at the root. - - For proc macros this is because they are always cross-crate and we don't have cross-crate hygiene implemented. - - The second hierarchy has the context transplantation hack. See https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. - - If the token had context X before being produced by a macro then after being produced by the macro it has context X -> macro_id. - - Ex: - ```rust - macro m() { ident } - ``` - - Here `ident` originally has context SyntaxContext::root(). `ident` has context ROOT -> id(m) after it's produced by m. - The "chaining operator" is `apply_mark` in compiler code. - - Ex: - - ```rust - macro m() { macro n() { ident } } - ``` - In this example the ident has context ROOT originally, then ROOT -> id(m), then ROOT -> id(m) -> id(n). - - Note that these chains are not entirely determined by their last element, in other words ExpnId is not isomorphic to SyntaxCtxt. - - Ex: - ```rust - macro m($i: ident) { macro n() { ($i, bar) } } - - m!(foo); - ``` - - After all expansions, foo has context ROOT -> id(n) and bar has context ROOT -> id(m) -> id(n) - - 3. Call-site: tracks the location of the macro invocation. - Ex: - If foo!(bar!(ident)) expands into ident - then hierarchy 1 is root -> foo -> bar -> ident - but hierarchy 3 is root -> ident - - ExpnInfo::call_site is the child-parent link in this case. - -- Hygiene-related algorithms are entirely in hygiene.rs - - Some hacks in `resolve_crate_root`, though. - -Expansion -- Expansion happens over a whole crate at once. -- We run `fully_expand_fragment` on the crate - - If `fully_expand_fragment` is run not on a whole crate, it means that we are performing eager expansion. - - We do this for some built-ins that expect literals (not exposed to users). - - It performs a subset of actions performed by non-eager expansion, so the discussion below focuses on eager expansion. - - Original description here: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 - - Algorithm: `fully_expand_fragment` works in iterations. We repeat until there are no unresolved macros left. - - Resolve imports in our partially built crate as much as possible. - - (link to name-resolution chapter) names resolved from "closer" scopes (e.g. current block) to further ones (e.g. prelude) - - A resolution fails differently for different scopes, e.g. for a module scope it means no unexpanded macros and no unresolved glob imports in that module. - - Collect as many macro invocations as possible from our partially built crate - (fn-like, attributes, derives) from the crate and add them to the queue. - - Take a macro from the queue, and attempt to resolve it. - - If it's resolved - run its expander function that consumes tokens or AST and produces tokens or AST (depending on the macro kind). (If it's not resolved, then put it back into the queue.) - - At this point, we know everything about the macro itself and can call `set_expn_data` to fill in its properties in the global data -- that is the hygiene data associated with `ExpnId`. - - The macro's expander function returns a piece of AST (or tokens). We need to integrate that piece of AST into the big existing partially built AST. - - If the macro produces tokens (e.g. a proc macro), we will have to parse into an AST, which may produce parse errors. - - During expansion, we create `SyntaxContext`s (heirarchy 2). - - This is essentially where the "token-like mass" becomes a proper set-in-stone AST with side-tables - - These three passes happen one after another on every AST fragment freshly expanded from a macro - - `NodeId`s are assigned by `InvocationCollector` - - also collects new macro calls from this new AST piece and adds them to the queue - - def_paths are created and `DefId`s are assigned to them by `DefCollector` - - `Name`s are put into modules (from the resolver's point of view) by `BuildReducedGraphVisitor` - - After expanding a single macro and integrating its output continue to the next iteration of `fully_expand_fragment`. - - If we make no progress in an iteration, then we have reached a compilation error (e.g. an undefined macro). - - - We attempt to recover from failures (unresolved macros or imports) for the sake of diagnostics - - recovery can't cause compilation to suceed. We know that it will fail at this point. - - we expand errors into `ExprKind::Err` or something like that for unresolved macros - - this allows compilation to continue past the first error so that we can report more errors at a time - -Relationship to name resolution -- name resolution is done for macro and import names during expansion and integration into the AST, as discussed above -- For all other names we certainly know whether a name is resolved successfully or not on the first attempt, because no new names can appear, due to hygiene - - They are resolved in a later pass, see `librustc_resolve/late.rs` From b8f2a23c26abc5b5c9ceaa5b6c77338c30cccf32 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 12:36:16 -0500 Subject: [PATCH 11/22] expand some notes about expansion :P --- src/macro-expansion.md | 120 +++++++++++++++++++++++++++-------------- src/name-resolution.md | 23 ++++++++ 2 files changed, 104 insertions(+), 39 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 27d9d45d1..f076ed5fa 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -14,45 +14,87 @@ we will look at the specifics of expanding different types of macros. ## Expansion and AST Integration -TODO: expand these notes (har har)... - -- Expansion happens over a whole crate at once. -- We run `fully_expand_fragment` on the crate - - If `fully_expand_fragment` is run not on a whole crate, it means that we are performing eager expansion. - - We do this for some built-ins that expect literals (not exposed to users). - - It performs a subset of actions performed by non-eager expansion, so the discussion below focuses on eager expansion. - - Original description here: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 - - Algorithm: `fully_expand_fragment` works in iterations. We repeat until there are no unresolved macros left. - - Resolve imports in our partially built crate as much as possible. - - (link to name-resolution chapter) names resolved from "closer" scopes (e.g. current block) to further ones (e.g. prelude) - - A resolution fails differently for different scopes, e.g. for a module scope it means no unexpanded macros and no unresolved glob imports in that module. - - Collect as many macro invocations as possible from our partially built crate - (fn-like, attributes, derives) from the crate and add them to the queue. - - Take a macro from the queue, and attempt to resolve it. - - If it's resolved - run its expander function that consumes tokens or AST and produces tokens or AST (depending on the macro kind). (If it's not resolved, then put it back into the queue.) - - At this point, we know everything about the macro itself and can call `set_expn_data` to fill in its properties in the global data -- that is the hygiene data associated with `ExpnId`. - - The macro's expander function returns a piece of AST (or tokens). We need to integrate that piece of AST into the big existing partially built AST. - - If the macro produces tokens (e.g. a proc macro), we will have to parse into an AST, which may produce parse errors. - - During expansion, we create `SyntaxContext`s (heirarchy 2). - - This is essentially where the "token-like mass" becomes a proper set-in-stone AST with side-tables - - These three passes happen one after another on every AST fragment freshly expanded from a macro - - `NodeId`s are assigned by `InvocationCollector` - - also collects new macro calls from this new AST piece and adds them to the queue - - def_paths are created and `DefId`s are assigned to them by `DefCollector` - - `Name`s are put into modules (from the resolver's point of view) by `BuildReducedGraphVisitor` - - After expanding a single macro and integrating its output continue to the next iteration of `fully_expand_fragment`. - - If we make no progress in an iteration, then we have reached a compilation error (e.g. an undefined macro). - - - We attempt to recover from failures (unresolved macros or imports) for the sake of diagnostics - - recovery can't cause compilation to suceed. We know that it will fail at this point. - - we expand errors into `ExprKind::Err` or something like that for unresolved macros - - this allows compilation to continue past the first error so that we can report more errors at a time - -### Relationship to name resolution - -- name resolution is done for macro and import names during expansion and integration into the AST, as discussed above -- For all other names we certainly know whether a name is resolved successfully or not on the first attempt, because no new names can appear, due to hygiene - - They are resolved in a later pass, see `librustc_resolve/late.rs` +First of all, expansion happens at the crate level. Given a raw source code for +a crate, the compiler will produce a massive AST with all macros expanded, all +modules inlined, etc. + +The primary entry point for this process is the +[`MacroExpander::fully_expand_fragment`][fef] method. Usually, we run this +method on a whole crate. If it is not run on a full crate, it means we are +doing _eager macro expansion_. Eager expansion means that we expand the +arguments of a macro invocation before the macro invocation itself. This is +implemented only for a few special built-in macros that expect literals (it's +not a generally available feature of Rust). Eager expansion generally performs +a subset of the things that lazy (normal) expansion does, so we will focus on +lazy expansion for the rest of this chapter. + +At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a +queue of unresolved macro invocations (that is, macros we haven't found the +definition of yet). We repeatedly try to pick a macro from the queue, resolve +it, expand it, and integrate it back. If we can't make progress in an +iteration, this represents a compile error. Here is the [algorithm][original]: + +[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment +[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 + +0. Initialize an `queue` of unresolved macros. +1. Repeat until `queue` is empty (or we make no progress, which is an error): + 0. [Resolve](./name-resolution.md) imports in our partially built crate as + much as possible. + 1. Collect as many macro invocations as possible from our partially built + crate (fn-like, attributes, derives) and add them to the queue. + 2. Dequeue the first element, and attempt to resolve it. + 3. If it's resolved: + 0. Run the macro's expander function that consumes tokens or AST and + produces tokens or AST (depending on the macro kind). + - At this point, we know everything about the macro itself and can + call `set_expn_data` to fill in its properties in the global data + -- that is the hygiene data associated with `ExpnId`. (See [the + "Hygiene" section below][hybelow]). + 1. Integrate that piece of AST into the big existing partially built + AST. This is essentially where the "token-like mass" becomes a + proper set-in-stone AST with side-tables. It happens as follows: + - If the macro produces tokens (e.g. a proc macro), we parse into + an AST, which may produce parse errors. + - During expansion, we create `SyntaxContext`s (heirarchy 2). (See + [the "Hygiene" section below][hybelow]) + - These three passes happen one after another on every AST fragment + freshly expanded from a macro: + - [`NodeId`]s are assigned by [`InvocationCollector`]. This + also collects new macro calls from this new AST piece and + adds them to the queue. + - ["Def paths"][defpath] are created and [`DefId`]s are + assigned to them by [`DefCollector`]. + - Names are put into modules (from the resolver's point of + view) by [`BuildReducedGraphVisitor`]. + 2. After expanding a single macro and integrating its output, continue + to the next iteration of [`fully_expand_fragment`][fef]. + 4. If it's not resolved: + 0. Put the macro back in the queue + 1. Continue to next iteration... + +[defpaths]: https://rustc-dev-guide.rust-lang.org/hir.html?highlight=def,path#identifiers-in-the-hir +[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html +[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html +[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html +[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html +[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html +[hybelow]: #hygiene-and-heirarchies + +If we make no progress in an iteration, then we have reached a compilation +error (e.g. an undefined macro). We attempt to recover from failures +(unresolved macros or imports) for the sake of diagnostics. This allows +compilation to continue past the first error, so that we can report more errors +at a time. Recovery can't cause compilation to suceed. We know that it will +fail at this point. The recovery happens by expanding unresolved macros into +[`ExprKind::Err`][err]. + +[err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err + +Notice that name resolution is involved here: we need to resolve imports and +macro names in the above algorithm. However, we don't try to resolve other +names yet. This happens later, as we will see in the [next +chapter](./name-resolution.md). ## Hygiene and Heirarchies diff --git a/src/name-resolution.md b/src/name-resolution.md index f3aacba00..d08fe43f3 100644 --- a/src/name-resolution.md +++ b/src/name-resolution.md @@ -1,5 +1,28 @@ # Name resolution +In the previous chapters, we saw how the AST is built with all macros expanded. +We saw how doing that requires doing some name resolution to resolve imports +and macro names. In this chapter, we show how this is actually done and more. + +In fact, we don't do full name resolution during macro expansion -- we only +resolve imports and macros at that time. This is required to know what to even +expand. Later, after we have the whole AST, we due full name resolution to +resolve all names in the crate. This happens in [`rustc_resolve::late`][late]. +Unlike during macro expansion, in this late expansion, we only need to try to +resolve a name once, since no new names can be added. If we fail to resolve a +name now, then it is a compiler error. + +Name resolution can be complex. There are a few different namespaces (e.g. +macros, values, types, lifetimes), and names my be valid at different (nested) +scopes. Also, different types of names can fail to be resolved differently, and +failures can happen differently at different scopes. For example, for a module +scope, failure means no unexpanded macros and no unresolved glob imports in +that module. On the other hand, in a function body, failure requires that a +name be absent from the block we are in, all outer scopes, and the global +scope. + +[late]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/late/index.html + ## Basics In our programs we can refer to variables, types, functions, etc, by giving them From be379bf2421402cca2d0006440635724f2a5df67 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 19:13:18 -0500 Subject: [PATCH 12/22] add a bit more info about eager exp --- src/macro-expansion.md | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index f076ed5fa..500b1a78c 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -28,6 +28,20 @@ not a generally available feature of Rust). Eager expansion generally performs a subset of the things that lazy (normal) expansion does, so we will focus on lazy expansion for the rest of this chapter. +As an example, consider the following: + +```rust,ignore +macro bar($i: ident) { $i } +macro foo($i: ident) { $i } + +foo!(bar!(baz)); +``` + +A lazy expansion would expand `foo!` first. An eager expansion would expand +`bar!` first. Implementing eager expansion more generally would be challenging, +but we implement it for a few special built-in macros for the sake of user +experience. + At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a queue of unresolved macro invocations (that is, macros we haven't found the definition of yet). We repeatedly try to pick a macro from the queue, resolve @@ -143,8 +157,6 @@ a macro author may want to introduce a new name to the context where the macro was called. Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro). -This section is about how that context is tracked. - [code_dir]: https://github.com/rust-lang/rust/tree/master/src/librustc_expand/mbe [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser [code_mr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_rules @@ -153,11 +165,6 @@ This section is about how that context is tracked. TODO: expand these notes -- Expansion is lazy. We work from the outside of a macro invocation inward. - - Ex: foo!(bar!(ident)) -> expand -> bar!(ident) -> expand -> ident - - Eager expansion: https://github.com/rust-lang/rfcs/pull/2320. - - Seems complicated to implemented - - We have it hacked into some built-in macros, but not generally. - Many AST nodes have some sort of syntax context, especially nodes from macros. - When we ask what is the syntax context of a node, the answer actually differs by what we are trying to do. Thus, we don't just keep track of a single context. There are in fact 3 different types of context used for different things. - Each type of context is tracked by an "expansion heirarchy". As we expand macros, new macro calls or macro definitions may be generated, leading to some nesting. This nesting is where the heirarchies come from. Each heirarchy tracks some different aspect, though, as we will see. From 59732308e7a4eaa087d9faed54306ff3a06e4749 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 20:08:36 -0500 Subject: [PATCH 13/22] expand notes on expansion heirarchies --- src/macro-expansion.md | 185 ++++++++++++++++++++++++++++------------- 1 file changed, 128 insertions(+), 57 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 500b1a78c..6bd809680 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -163,82 +163,153 @@ only within the macro (i.e. it should not be visible outside the macro). [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/fn.parse_tt.html [parsing]: ./the-parser.html -TODO: expand these notes +The context is attached to AST nodes. All AST nodes generated by macros have +context attached. Additionally, there may be other nodes that have context +attached, such as some desugared syntax (non-macro-expanded nodes are +considered to just have the "root" context, as described below). -- Many AST nodes have some sort of syntax context, especially nodes from macros. -- When we ask what is the syntax context of a node, the answer actually differs by what we are trying to do. Thus, we don't just keep track of a single context. There are in fact 3 different types of context used for different things. -- Each type of context is tracked by an "expansion heirarchy". As we expand macros, new macro calls or macro definitions may be generated, leading to some nesting. This nesting is where the heirarchies come from. Each heirarchy tracks some different aspect, though, as we will see. -- There are 3 expansion heirarchies - - All macros receive an integer ID assigned continuously starting from 0 as we discover new macro calls - - This is used as the `expn_id` where needed. - - All heirarchies start at ExpnId::root, which is its own parent - - The context of a node consists of a chain of expansions leading to `ExpnId::root`. A non-macro-expanded node has syntax context 0 (`SyntaxContext::empty()`) which represents just the root node. - - There are vectors in `HygieneData` that contain expansion info. - - There are entries here for both `SyntaxContext::empty()` and `ExpnId::root`, but they aren't used much. +Because macros invocations and definitions can be nested, the syntax context of +a node must be a heirarchy. For example, if we expand a macro and there is +another macro invocation or definition in the generated output, then the syntax +context should reflex the nesting. - 1. Tracks expansion order: when a macro invocation is in the output of another macro. - ... - expn_id2 - expn_id1 - InternalExpnData::parent is the child->parent link. That is the expn_id1 points to expn_id2 points to ... +However, it turns out that there are actually a few types of context we may +want to track for different purposes. Thus, there not just one but _three_ +expansion heirarchies that together comprise the hygiene information for a +crate. - Ex: - macro_rules! foo { () => { println!(); } } - fn main() { foo!(); } +All of these heirarchies need some sort of "macro ID" to identify individual +elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive +an integer ID, assigned continuously starting from 0 as we discover new macro +calls. All heirarchies start at [`ExpnId::root()`][rootid], which is its own +parent. - // Then AST nodes that are finally generated would have parent(expn_id_println) -> parent(expn_id_foo), right? +The actual heirarchies are stored in [`HygieneData`][hd], and all of the +hygiene-related algorithms are implemented in [`rustc_span::hygiene`][hy], with +the exception of some hacks [`Resolver::resolve_crate_root`][hacks]. - 2. Tracks macro definitions: when we are expanding one macro another macro definition is revealed in its output. - ... - SyntaxContext2 - SyntaxContext1 - SyntaxContextData::parent is the child->parent link here. - SyntaxContext is the whole chain in this hierarchy, and SyntaxContextData::outer_expns are individual elements in the chain. +[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html +[rootid]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html#method.root +[hd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.HygieneData.html +[hy]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html +[hacks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.resolve_crate_root - - For built-in macros (e.g. `line!()`) or stable proc macros: tokens produced by the macro are given the context `SyntaxContext::empty().apply_mark(expn_id)` - - Such macros are considered to have been defined at the root. - - For proc macros this is because they are always cross-crate and we don't have cross-crate hygiene implemented. +### The Expansion Order Heirarchy - The second hierarchy has the context transplantation hack. See https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732. +The first heirarchy tracks the order of expansions, i.e., when a macro +invocation is in the output of another macro. - If the token had context X before being produced by a macro then after being produced by the macro it has context X -> macro_id. +Here, the children in the heirarchy will be the "innermost" tokens. +[`ExpnData::parent`][edp] tracks the child -> parent link in this heirarchy. - Ex: - ```rust - macro m() { ident } - ``` +[edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent - Here `ident` originally has context SyntaxContext::root(). `ident` has context ROOT -> id(m) after it's produced by m. - The "chaining operator" is `apply_mark` in compiler code. +For example, - Ex: +```rust,ignore +macro_rules! foo { () => { println!(); } } + +fn main() { foo!(); } +``` + +In this code, the AST nodes that are finally generated would have heirarchy: + +``` +root + expn_id_foo + expn_id_println +``` + +### The Macro Definition Heirarchy + +The second heirarchy tracks the order of macro definitions, i.e., when we are +expanding one macro another macro definition is revealed in its output. This +one is a bit tricky and more complex than the other two heirarchies. + +Here, [`SyntaxContextData::parent`][scdp] is the child -> parent link here. +[`SyntaxContext`][sc] is the whole chain in this hierarchy, and +[`SyntaxContextData::outer_expns`][scdoe] are individual elements in the chain. +The "chaining operator" is [`SyntaxContext::apply_mark`][am] in compiler code. + +[scdp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.parent +[sc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html +[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn +[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark + +For built-in macros, we use the context: +`SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to +be defined at the heirarchy root. We do the same for proc-macros because we +haven't implemented cross-crate hygiene yet. + +If the token had context `X` before being produced by a macro then after being +produced by the macro it has context `X -> macro_id`. Here are some examples: + +Example 0: + +```rust,ignore +macro m() { ident } + +m!(); +``` + +Here `ident` originally has context [`SyntaxContext::root()`][scr]. `ident` has +context `ROOT -> id(m)` after it's produced by `m`. + +[scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root - ```rust - macro m() { macro n() { ident } } - ``` - In this example the ident has context ROOT originally, then ROOT -> id(m), then ROOT -> id(m) -> id(n). - Note that these chains are not entirely determined by their last element, in other words ExpnId is not isomorphic to SyntaxCtxt. +Example 1: - Ex: - ```rust - macro m($i: ident) { macro n() { ($i, bar) } } +```rust,ignore +macro m() { macro n() { ident } } + +m!(); +n!(); +``` +In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)` +after the first expansion, then `ROOT -> id(m) -> id(n)`. + +Example 2: + +Note that these chains are not entirely determined by their last element, in +other words `ExpnId` is not isomorphic to `SyntaxContext`. + +```rust,ignore +macro m($i: ident) { macro n() { ($i, bar) } } + +m!(foo); +``` + +After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context +`ROOT -> id(m) -> id(n)`. - m!(foo); - ``` +Finally, one last thing to mention is that currently, this heirarchy is subject +to the ["context transplantation hack"][hack]. Basically, the more modern (and +experimental) `macro` macros have stronger hygiene than the older MBE system, +but this can result in weird interactions between the two. The hack is intended +to make things "just work" for now. - After all expansions, foo has context ROOT -> id(n) and bar has context ROOT -> id(m) -> id(n) +[hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732 - 3. Call-site: tracks the location of the macro invocation. - Ex: - If foo!(bar!(ident)) expands into ident - then hierarchy 1 is root -> foo -> bar -> ident - but hierarchy 3 is root -> ident +### The Call-site Heirarchy - ExpnInfo::call_site is the child-parent link in this case. +The third and final heirarchy tracks the location of macro invocations. + +In this heirarchy [`ExpnData::call_site`][callsite] is the child -> parent link. + +[callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site + +Here is an example: + +```rust,ignore +macro bar($i: ident) { $i } +macro foo($i: ident) { $i } + +foo!(bar!(baz)); +``` -- Hygiene-related algorithms are entirely in hygiene.rs - - Some hacks in `resolve_crate_root`, though. +For the `baz` AST node in the final output, the first heirarchy is `ROOT -> +id(foo) -> id(bar) -> baz`, while the third heirarchy is `ROOT -> baz`. ## Producing Macro Output From 08d66638f7d983c9e1cda07ee133e7e85ab3e6c2 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 20:49:46 -0500 Subject: [PATCH 14/22] sprinkle around a bunch of links --- src/macro-expansion.md | 122 ++++++++++++++++++++++++++++------------- 1 file changed, 85 insertions(+), 37 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 6bd809680..faf9bad45 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -55,15 +55,19 @@ iteration, this represents a compile error. Here is the [algorithm][original]: 1. Repeat until `queue` is empty (or we make no progress, which is an error): 0. [Resolve](./name-resolution.md) imports in our partially built crate as much as possible. - 1. Collect as many macro invocations as possible from our partially built - crate (fn-like, attributes, derives) and add them to the queue. + 1. Collect as many macro [`Invocation`s][inv] as possible from our + partially built crate (fn-like, attributes, derives) and add them to the + queue. 2. Dequeue the first element, and attempt to resolve it. 3. If it's resolved: - 0. Run the macro's expander function that consumes tokens or AST and - produces tokens or AST (depending on the macro kind). + 0. Run the macro's expander function that consumes a [`TokenStream`] or + AST and produces a [`TokenStream`] or [`AstFragment`] (depending on + the macro kind). (A `TokenStream` is a collection of [`TokenTrees`], + each of which are a token (punctuation, identifier, or literal) or a + delimited group (anything inside `()`/`[]`/`{}`)). - At this point, we know everything about the macro itself and can - call `set_expn_data` to fill in its properties in the global data - -- that is the hygiene data associated with `ExpnId`. (See [the + call `set_expn_data` to fill in its properties in the global data; + that is the hygiene data associated with `ExpnId`. (See [the "Hygiene" section below][hybelow]). 1. Integrate that piece of AST into the big existing partially built AST. This is essentially where the "token-like mass" becomes a @@ -94,6 +98,10 @@ iteration, this represents a compile error. Here is the [algorithm][original]: [`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html [`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html [hybelow]: #hygiene-and-heirarchies +[`TokenTree`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html +[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html +[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html +[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html If we make no progress in an iteration, then we have reached a compilation error (e.g. an undefined macro). We attempt to recover from failures @@ -110,6 +118,27 @@ macro names in the above algorithm. However, we don't try to resolve other names yet. This happens later, as we will see in the [next chapter](./name-resolution.md). +Here are some other notable data structures involved in expansion and integration: +- [`Resolver`] - a trait used to break crate dependencies. This allows the resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and pretty much everything else depending on [`rustc_ast`]. +- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion + infrastructure in the process of its work +- [`Annotatable`] - a piece of AST that can be an attribute target, almost same + thing as AstFragment except for types and patterns that can be produced by + macros but cannot be annotated with attributes +- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a + different `AstFragment` depending on its [`AstFragmentKind`] - item, + or expression, or pattern etc. + +[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html +[`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html +[`Resolver`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.Resolver.html +[`ExtCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExtCtxt.html +[`ExpansionData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExpansionData.html +[`Annotatable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.Annotatable.html +[`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html +[`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html + + ## Hygiene and Heirarchies If you have ever used C/C++ preprocessor macros, you know that there are some @@ -167,6 +196,10 @@ The context is attached to AST nodes. All AST nodes generated by macros have context attached. Additionally, there may be other nodes that have context attached, such as some desugared syntax (non-macro-expanded nodes are considered to just have the "root" context, as described below). +Throughout the compiler, we use [`Span`s][span] to refer to code locations. +This struct also has hygiene information attached to it, as we will see later. + +[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html Because macros invocations and definitions can be nested, the syntax context of a node must be a heirarchy. For example, if we expand a macro and there is @@ -184,24 +217,33 @@ an integer ID, assigned continuously starting from 0 as we discover new macro calls. All heirarchies start at [`ExpnId::root()`][rootid], which is its own parent. -The actual heirarchies are stored in [`HygieneData`][hd], and all of the -hygiene-related algorithms are implemented in [`rustc_span::hygiene`][hy], with -the exception of some hacks [`Resolver::resolve_crate_root`][hacks]. +All of the hygiene-related algorithms are implemented in +[`rustc_span::hygiene`][hy], with the exception of some hacks +[`Resolver::resolve_crate_root`][hacks]. + +The actual heirarchies are stored in [`HygieneData`][hd]. This is a global +piece of data containing hygiene and expansion info that can be accessed from +any [`Ident`] without any context. + [`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html [rootid]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html#method.root [hd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.HygieneData.html [hy]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html [hacks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.resolve_crate_root +[`Ident`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Ident.html ### The Expansion Order Heirarchy The first heirarchy tracks the order of expansions, i.e., when a macro invocation is in the output of another macro. -Here, the children in the heirarchy will be the "innermost" tokens. +Here, the children in the heirarchy will be the "innermost" tokens. The +[`ExpnData`] struct itself contains a subset of properties from both macro +definition and macro call available through global data. [`ExpnData::parent`][edp] tracks the child -> parent link in this heirarchy. +[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html [edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent For example, @@ -226,11 +268,20 @@ The second heirarchy tracks the order of macro definitions, i.e., when we are expanding one macro another macro definition is revealed in its output. This one is a bit tricky and more complex than the other two heirarchies. -Here, [`SyntaxContextData::parent`][scdp] is the child -> parent link here. -[`SyntaxContext`][sc] is the whole chain in this hierarchy, and -[`SyntaxContextData::outer_expns`][scdoe] are individual elements in the chain. -The "chaining operator" is [`SyntaxContext::apply_mark`][am] in compiler code. +[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID. +[`SyntaxContextData`][scd] contains data associated with the given +`SyntaxContext`; mostly it is a cache for results of filtering that chain in +different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent +link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual +elements in the chain. The "chaining operator" is +[`SyntaxContext::apply_mark`][am] in compiler code. + +A [`Span`][span], mentioned above, is actually just a compact representation of +a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned +[`Symbol`] + `Span` (i.e. an interned string + hygiene data). +[`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html +[scd]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html [scdp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.parent [sc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html [scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn @@ -323,6 +374,24 @@ There are two types of macros in Rust: Rust parser will set aside the contents of macros and their invocations. Later, macros are expanded using these portions of the code. +Some important data structures/interfaces here: +- [`SyntaxExtension`] - a lowered macro representation, contains its expander + function, which transforms a `TokenStream` or AST into another `TokenStream` + or AST + some additional data like stability, or a list of unstable features + allowed inside the macro. +- [`SyntaxExtensionKind`] - expander functions may have several different + signatures (take one token stream, or two, or a piece of AST, etc). This is + an enum that lists them. +- [`ProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] - + traits representing the expander function signatures. + +[`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html +[`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html +[`ProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ProcMacro.html +[`TTMacroExpander`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.TTMacroExpander.html +[`AttrProcMacro`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.AttrProcMacro.html +[`MultiItemModifier`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MultiItemModifier.html + ## Macros By Example MBEs have their own parser distinct from the normal Rust parser. When macros @@ -492,11 +561,10 @@ Custom derives are a special type of proc macro. TODO: more? -## Notes from petrochenkov discussion +## Important Modules and Data Structures -TODO: sprinkle these links around the chapter... +TODO: sprinkle these throughout the chapter as much as possible... -Where to find the code: - librustc_span/hygiene.rs - structures related to hygiene and expansion that are kept in global data (can be accessed from any Ident without any context) - librustc_span/lib.rs - some secondary methods like macro backtrace using primary methods from hygiene.rs - librustc_builtin_macros - implementations of built-in macros (including macro attributes and derives) and some other early code generation facilities like injection of standard library imports or generation of test harness. @@ -511,23 +579,3 @@ Where to find the code: - librustc_ast/ext/tt - implementation of macro_rules, turns macro_rules DSL into something with signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, @mark-i-m knows more about this - librustc_resolve/macros.rs - resolving macro paths, validating those resolutions, reporting various "not found"/"found, but it's unstable"/"expected x, found y" errors - librustc_middle/hir/map/def_collector.rs + librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly expanded from a macro into various parent/child structures like module hierarchy or "definition paths" - -Primary structures: -- HygieneData - global piece of data containing hygiene and expansion info that can be accessed from any Ident without any context -- ExpnId - ID of a macro call or desugaring (and also expansion of that call/desugaring, depending on context) -- ExpnInfo/InternalExpnData - a subset of properties from both macro definition and macro call available through global data -- SyntaxContext - ID of a chain of nested macro definitions (identified by ExpnIds) -- SyntaxContextData - data associated with the given SyntaxContext, mostly a cache for results of filtering that chain in different ways -- Span - a code location + SyntaxContext -- Ident - interned string (Symbol) + Span, i.e. a string with attached hygiene data -- TokenStream - a collection of TokenTrees -- TokenTree - a token (punctuation, identifier, or literal) or a delimited group (anything inside ()/[]/{}) -- SyntaxExtension - a lowered macro representation, contains its expander function transforming a tokenstream or AST into tokenstream or AST + some additional data like stability, or a list of unstable features allowed inside the macro. -- SyntaxExtensionKind - expander functions may have several different signatures (take one token stream, or two, or a piece of AST, etc), this is an enum that lists them -- ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing the expander signatures (TODO: change and rename the signatures into something more consistent) -- Resolver - a trait used to break crate dependencies (so resolver services can be used in librustc_ast, despite librustc_resolve and pretty much everything else depending on librustc_ast) -- ExtCtxt/ExpansionData - various intermediate data kept and used by expansion infra in the process of its work -- AstFragment - a piece of AST that can be produced by a macro (may include multiple homogeneous AST nodes, like e.g. a list of items) -- Annotatable - a piece of AST that can be an attribute target, almost same thing as AstFragment except for types and patterns that can be produced by macros but cannot be annotated with attributes (TODO: Merge into AstFragment) -- MacResult - a "polymorphic" AST fragment, something that can turn into a different AstFragment depending on its context (aka AstFragmentKind - item, or expression, or pattern etc.) -- Invocation/InvocationKind - a structure describing a macro call, these structures are collected by the expansion infra (InvocationCollector), queued, resolved, expanded when resolved, etc. From 14347dce02e277a9f880af8aedfc6db6657ed52a Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 21:22:25 -0500 Subject: [PATCH 15/22] SPRINKLE ALL THE THINGS --- src/macro-expansion.md | 88 ++++++++++++++++++++++++++---------------- src/the-parser.md | 3 +- 2 files changed, 57 insertions(+), 34 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index faf9bad45..5cd3c067e 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -4,14 +4,27 @@ > refactoring, so some of the links in this chapter may be broken. Rust has a very powerful macro system. In the previous chapter, we saw how the -parser sets aside macros to be expanded. This chapter is about the process of -expanding those macros iteratively until we have a complete AST for our crate -with no unexpanded macros (or a compile error). +parser sets aside macros to be expanded (it temporarily uses [placeholders]). +This chapter is about the process of expanding those macros iteratively until +we have a complete AST for our crate with no unexpanded macros (or a compile +error). + +[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html First, we will discuss the algorithm that expands and integrates macro output into ASTs. Next, we will take a look at how hygiene data is collected. Finally, we will look at the specifics of expanding different types of macros. +Many of the algorithms and data structures described below are in [`rustc_expand`], +with basic data structures in [`rustc_expand::base`][base]. + +Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are +handled in [`rustc_expand::config`][cfg]. + +[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html +[base]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/index.html +[cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/config/index.html + ## Expansion and AST Integration First of all, expansion happens at the crate level. Given a raw source code for @@ -24,10 +37,7 @@ method on a whole crate. If it is not run on a full crate, it means we are doing _eager macro expansion_. Eager expansion means that we expand the arguments of a macro invocation before the macro invocation itself. This is implemented only for a few special built-in macros that expect literals (it's -not a generally available feature of Rust). Eager expansion generally performs -a subset of the things that lazy (normal) expansion does, so we will focus on -lazy expansion for the rest of this chapter. - +not a generally available feature of Rust). As an example, consider the following: ```rust,ignore @@ -40,7 +50,16 @@ foo!(bar!(baz)); A lazy expansion would expand `foo!` first. An eager expansion would expand `bar!` first. Implementing eager expansion more generally would be challenging, but we implement it for a few special built-in macros for the sake of user -experience. +experience. The built-in macros are implemented in [`rustc_builtin_macros`], +along with some other early code generation facilities like injection of +standard library imports or generation of test harness. There are some +additional helpers for building their AST fragments in +[`rustc_expand::build`][reb]. Eager expansion generally performs a subset of +the things that lazy (normal) expansion does, so we will focus on lazy +expansion for the rest of this chapter. + +[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html +[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a queue of unresolved macro invocations (that is, macros we haven't found the @@ -114,10 +133,15 @@ fail at this point. The recovery happens by expanding unresolved macros into [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err Notice that name resolution is involved here: we need to resolve imports and -macro names in the above algorithm. However, we don't try to resolve other -names yet. This happens later, as we will see in the [next +macro names in the above algorithm. This is done in +[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates +those resolutions, and reports various errors (e.g. "not found" or "found, but +it's unstable" or "expected x, found y"). However, we don't try to resolve +other names yet. This happens later, as we will see in the [next chapter](./name-resolution.md). +[mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html + Here are some other notable data structures involved in expansion and integration: - [`Resolver`] - a trait used to break crate dependencies. This allows the resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and pretty much everything else depending on [`rustc_ast`]. - [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion @@ -217,9 +241,9 @@ an integer ID, assigned continuously starting from 0 as we discover new macro calls. All heirarchies start at [`ExpnId::root()`][rootid], which is its own parent. -All of the hygiene-related algorithms are implemented in -[`rustc_span::hygiene`][hy], with the exception of some hacks -[`Resolver::resolve_crate_root`][hacks]. +[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms +(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks]) +and structures related to hygiene and expansion that are kept in global data. The actual heirarchies are stored in [`HygieneData`][hd]. This is a global piece of data containing hygiene and expansion info that can be accessed from @@ -362,6 +386,13 @@ foo!(bar!(baz)); For the `baz` AST node in the final output, the first heirarchy is `ROOT -> id(foo) -> id(bar) -> baz`, while the third heirarchy is `ROOT -> baz`. +### Macro Backtraces + +Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery +in [`rustc_span::hygiene`][hy]. + +[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html + ## Producing Macro Output Above, we saw how the output of a macro is integrated into the AST for a crate, @@ -551,7 +582,17 @@ stream, which is synthesized into the AST. It's worth noting that the token stream type used by proc macros is _stable_, so `rustc` does not use it internally (since our internal data structures are -unstable). +unstable). The compiler's token stream is +[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is +converted into the stable [`proc_macro::TokenStream`][stablets] and back in +[`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms]. +Because the Rust ABI is unstable, we use the C ABI for this conversion. + +[tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html +[rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html +[stablets]: https://doc.rust-lang.org/proc_macro/struct.TokenStream.html +[pm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro/index.html +[pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html TODO: more here. @@ -560,22 +601,3 @@ TODO: more here. Custom derives are a special type of proc macro. TODO: more? - -## Important Modules and Data Structures - -TODO: sprinkle these throughout the chapter as much as possible... - -- librustc_span/hygiene.rs - structures related to hygiene and expansion that are kept in global data (can be accessed from any Ident without any context) -- librustc_span/lib.rs - some secondary methods like macro backtrace using primary methods from hygiene.rs -- librustc_builtin_macros - implementations of built-in macros (including macro attributes and derives) and some other early code generation facilities like injection of standard library imports or generation of test harness. -- librustc_ast/config.rs - implementation of cfg/cfg_attr (they treated specially from other macros), should probably be moved into librustc_ast/ext. -- librustc_ast/tokenstream.rs + librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees, and token streams. -- librustc_ast/ext - various expansion-related stuff -- librustc_ast/ext/base.rs - basic structures used by expansion -- librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion infrastructure code - collecting macro invocations, calling into resolve for them, calling their expanding functions, and integrating the results back into AST -- librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for "integrating the results back into AST" basicallly, "placeholder" is a temporary AST node replaced with macro expansion result nodes -- librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably be moved into librustc_builtin_macros these days -- librustc_ast/ext/proc_macro.rs + librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the stable proc_macro library, converting tokens and token streams between the two representations and sending them through C ABI -- librustc_ast/ext/tt - implementation of macro_rules, turns macro_rules DSL into something with signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, @mark-i-m knows more about this -- librustc_resolve/macros.rs - resolving macro paths, validating those resolutions, reporting various "not found"/"found, but it's unstable"/"expected x, found y" errors -- librustc_middle/hir/map/def_collector.rs + librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly expanded from a macro into various parent/child structures like module hierarchy or "definition paths" diff --git a/src/the-parser.md b/src/the-parser.md index c0f2a071b..da318c9ef 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -7,10 +7,11 @@ The very first thing the compiler does is take the program (in Unicode characters) and turn it into something the compiler can work with more conveniently than strings. This happens in two stages: Lexing and Parsing. -Lexing takes strings and turns them into streams of tokens. For example, +Lexing takes strings and turns them into streams of [tokens]. For example, `a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`. The lexer lives in [`librustc_lexer`][lexer]. +[tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html Parsing then takes streams of tokens and turns them into a structured From f4db809ab1b4c03301d59a8f4dbb3a36f4cea699 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 22:41:58 -0500 Subject: [PATCH 16/22] fix line length --- src/macro-expansion.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 5cd3c067e..d8b5c392c 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -143,7 +143,9 @@ chapter](./name-resolution.md). [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html Here are some other notable data structures involved in expansion and integration: -- [`Resolver`] - a trait used to break crate dependencies. This allows the resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and pretty much everything else depending on [`rustc_ast`]. +- [`Resolver`] - a trait used to break crate dependencies. This allows the + resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and + pretty much everything else depending on [`rustc_ast`]. - [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion infrastructure in the process of its work - [`Annotatable`] - a piece of AST that can be an attribute target, almost same From e0d772680606cd8b63d3dd8db630699289242375 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 30 Apr 2020 22:47:13 -0500 Subject: [PATCH 17/22] fix some links --- src/macro-expansion.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index d8b5c392c..50e4dc2a7 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -81,7 +81,7 @@ iteration, this represents a compile error. Here is the [algorithm][original]: 3. If it's resolved: 0. Run the macro's expander function that consumes a [`TokenStream`] or AST and produces a [`TokenStream`] or [`AstFragment`] (depending on - the macro kind). (A `TokenStream` is a collection of [`TokenTrees`], + the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt], each of which are a token (punctuation, identifier, or literal) or a delimited group (anything inside `()`/`[]`/`{}`)). - At this point, we know everything about the macro itself and can @@ -110,14 +110,14 @@ iteration, this represents a compile error. Here is the [algorithm][original]: 0. Put the macro back in the queue 1. Continue to next iteration... -[defpaths]: https://rustc-dev-guide.rust-lang.org/hir.html?highlight=def,path#identifiers-in-the-hir +[defpath]: https://rustc-dev-guide.rust-lang.org/hir.html?highlight=def,path#identifiers-in-the-hir [`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html [`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html [`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html [`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html [`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html [hybelow]: #hygiene-and-heirarchies -[`TokenTree`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html +[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html [`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html [inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html [`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html From 3def533f0bee29f96e2b66a706fc4eec156414b8 Mon Sep 17 00:00:00 2001 From: Who? Me?! Date: Sat, 2 May 2020 20:54:27 -0500 Subject: [PATCH 18/22] Typos Co-authored-by: Chris Simpkins --- src/macro-expansion.md | 48 +++++++++++++++++++++--------------------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 50e4dc2a7..a29f56b7c 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -93,7 +93,7 @@ iteration, this represents a compile error. Here is the [algorithm][original]: proper set-in-stone AST with side-tables. It happens as follows: - If the macro produces tokens (e.g. a proc macro), we parse into an AST, which may produce parse errors. - - During expansion, we create `SyntaxContext`s (heirarchy 2). (See + - During expansion, we create `SyntaxContext`s (hierarchy 2). (See [the "Hygiene" section below][hybelow]) - These three passes happen one after another on every AST fragment freshly expanded from a macro: @@ -116,7 +116,7 @@ iteration, this represents a compile error. Here is the [algorithm][original]: [`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html [`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html [`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html -[hybelow]: #hygiene-and-heirarchies +[hybelow]: #hygiene-and-hierarchies [tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html [`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html [inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html @@ -165,7 +165,7 @@ Here are some other notable data structures involved in expansion and integratio [`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html -## Hygiene and Heirarchies +## Hygiene and Hierarchies If you have ever used C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code: @@ -228,26 +228,26 @@ This struct also has hygiene information attached to it, as we will see later. [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html Because macros invocations and definitions can be nested, the syntax context of -a node must be a heirarchy. For example, if we expand a macro and there is +a node must be a hierarchy. For example, if we expand a macro and there is another macro invocation or definition in the generated output, then the syntax context should reflex the nesting. However, it turns out that there are actually a few types of context we may -want to track for different purposes. Thus, there not just one but _three_ -expansion heirarchies that together comprise the hygiene information for a +want to track for different purposes. Thus, there are not just one but _three_ +expansion hierarchies that together comprise the hygiene information for a crate. -All of these heirarchies need some sort of "macro ID" to identify individual +All of these hierarchies need some sort of "macro ID" to identify individual elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive an integer ID, assigned continuously starting from 0 as we discover new macro -calls. All heirarchies start at [`ExpnId::root()`][rootid], which is its own +calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own parent. [`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms (with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks]) and structures related to hygiene and expansion that are kept in global data. -The actual heirarchies are stored in [`HygieneData`][hd]. This is a global +The actual hierarchies are stored in [`HygieneData`][hd]. This is a global piece of data containing hygiene and expansion info that can be accessed from any [`Ident`] without any context. @@ -259,15 +259,15 @@ any [`Ident`] without any context. [hacks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.resolve_crate_root [`Ident`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Ident.html -### The Expansion Order Heirarchy +### The Expansion Order Hierarchy -The first heirarchy tracks the order of expansions, i.e., when a macro +The first hierarchy tracks the order of expansions, i.e., when a macro invocation is in the output of another macro. -Here, the children in the heirarchy will be the "innermost" tokens. The +Here, the children in the hierarchy will be the "innermost" tokens. The [`ExpnData`] struct itself contains a subset of properties from both macro definition and macro call available through global data. -[`ExpnData::parent`][edp] tracks the child -> parent link in this heirarchy. +[`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy. [`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html [edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent @@ -280,7 +280,7 @@ macro_rules! foo { () => { println!(); } } fn main() { foo!(); } ``` -In this code, the AST nodes that are finally generated would have heirarchy: +In this code, the AST nodes that are finally generated would have hierarchy: ``` root @@ -288,11 +288,11 @@ root expn_id_println ``` -### The Macro Definition Heirarchy +### The Macro Definition Hierarchy -The second heirarchy tracks the order of macro definitions, i.e., when we are +The second hierarchy tracks the order of macro definitions, i.e., when we are expanding one macro another macro definition is revealed in its output. This -one is a bit tricky and more complex than the other two heirarchies. +one is a bit tricky and more complex than the other two hierarchies. [`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID. [`SyntaxContextData`][scd] contains data associated with the given @@ -315,7 +315,7 @@ a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned For built-in macros, we use the context: `SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to -be defined at the heirarchy root. We do the same for proc-macros because we +be defined at the hierarchy root. We do the same for proc-macros because we haven't implemented cross-crate hygiene yet. If the token had context `X` before being produced by a macro then after being @@ -360,7 +360,7 @@ m!(foo); After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context `ROOT -> id(m) -> id(n)`. -Finally, one last thing to mention is that currently, this heirarchy is subject +Finally, one last thing to mention is that currently, this hierarchy is subject to the ["context transplantation hack"][hack]. Basically, the more modern (and experimental) `macro` macros have stronger hygiene than the older MBE system, but this can result in weird interactions between the two. The hack is intended @@ -368,11 +368,11 @@ to make things "just work" for now. [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732 -### The Call-site Heirarchy +### The Call-site Hierarchy -The third and final heirarchy tracks the location of macro invocations. +The third and final hierarchy tracks the location of macro invocations. -In this heirarchy [`ExpnData::call_site`][callsite] is the child -> parent link. +In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link. [callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site @@ -385,8 +385,8 @@ macro foo($i: ident) { $i } foo!(bar!(baz)); ``` -For the `baz` AST node in the final output, the first heirarchy is `ROOT -> -id(foo) -> id(bar) -> baz`, while the third heirarchy is `ROOT -> baz`. +For the `baz` AST node in the final output, the first hierarchy is `ROOT -> +id(foo) -> id(bar) -> baz`, while the third hierarchy is `ROOT -> baz`. ### Macro Backtraces From 29cd15473c76bb4fdfe2be3da4a0acf51a9869ae Mon Sep 17 00:00:00 2001 From: Who? Me?! Date: Sat, 2 May 2020 21:09:44 -0500 Subject: [PATCH 19/22] Use full path of span Co-authored-by: Chris Simpkins --- src/macro-expansion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index a29f56b7c..22895a68d 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -222,7 +222,7 @@ The context is attached to AST nodes. All AST nodes generated by macros have context attached. Additionally, there may be other nodes that have context attached, such as some desugared syntax (non-macro-expanded nodes are considered to just have the "root" context, as described below). -Throughout the compiler, we use [`Span`s][span] to refer to code locations. +Throughout the compiler, we use [`librustc_span::Span`s][span] to refer to code locations. This struct also has hygiene information attached to it, as we will see later. [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html From 2ec8d1f4e77d5c6f6ad7f69ca06c38844bf62d0d Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 2 May 2020 21:02:54 -0500 Subject: [PATCH 20/22] move discussion of eager expansion to the end --- src/macro-expansion.md | 60 ++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 28 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 22895a68d..aa6680f45 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -29,34 +29,10 @@ handled in [`rustc_expand::config`][cfg]. First of all, expansion happens at the crate level. Given a raw source code for a crate, the compiler will produce a massive AST with all macros expanded, all -modules inlined, etc. - -The primary entry point for this process is the -[`MacroExpander::fully_expand_fragment`][fef] method. Usually, we run this -method on a whole crate. If it is not run on a full crate, it means we are -doing _eager macro expansion_. Eager expansion means that we expand the -arguments of a macro invocation before the macro invocation itself. This is -implemented only for a few special built-in macros that expect literals (it's -not a generally available feature of Rust). -As an example, consider the following: - -```rust,ignore -macro bar($i: ident) { $i } -macro foo($i: ident) { $i } - -foo!(bar!(baz)); -``` - -A lazy expansion would expand `foo!` first. An eager expansion would expand -`bar!` first. Implementing eager expansion more generally would be challenging, -but we implement it for a few special built-in macros for the sake of user -experience. The built-in macros are implemented in [`rustc_builtin_macros`], -along with some other early code generation facilities like injection of -standard library imports or generation of test harness. There are some -additional helpers for building their AST fragments in -[`rustc_expand::build`][reb]. Eager expansion generally performs a subset of -the things that lazy (normal) expansion does, so we will focus on lazy -expansion for the rest of this chapter. +modules inlined, etc. The primary entry point for this process is the +[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we +use this method on the whole crate (see ["Eager Expansion"](#eager-expansion) +below for more detailed discussion of edge case expansion issues). [`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html [reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html @@ -164,6 +140,34 @@ Here are some other notable data structures involved in expansion and integratio [`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html [`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html +### Eager Expansion + +_Eager expansion_ means that we expand the arguments of a macro invocation +before the macro invocation itself. This is implemented only for a few special +built-in macros that expect literals; expanding arguments first for some of +these macro results in a smoother user experience. As an example, consider the +following: + +```rust,ignore +macro bar($i: ident) { $i } +macro foo($i: ident) { $i } + +foo!(bar!(baz)); +``` + +A lazy expansion would expand `foo!` first. An eager expansion would expand +`bar!` first. + +Eager expansion is not a generally available feature of Rust. Implementing +eager expansion more generally would be challenging, but we implement it for a +few special built-in macros for the sake of user experience. The built-in +macros are implemented in [`rustc_builtin_macros`], along with some other early +code generation facilities like injection of standard library imports or +generation of test harness. There are some additional helpers for building +their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally +performs a subset of the things that lazy (normal) expansion. It is done by +invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to +whole crate, like we normally do). ## Hygiene and Hierarchies From f9c8992615f62e18c9f8d305dec84f38c1bf064d Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 2 May 2020 21:10:14 -0500 Subject: [PATCH 21/22] add some section headers --- src/macro-expansion.md | 50 +++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index aa6680f45..ea1240fc4 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -98,6 +98,8 @@ iteration, this represents a compile error. Here is the [algorithm][original]: [inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html [`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html +### Error Recovery + If we make no progress in an iteration, then we have reached a compilation error (e.g. an undefined macro). We attempt to recover from failures (unresolved macros or imports) for the sake of diagnostics. This allows @@ -108,6 +110,8 @@ fail at this point. The recovery happens by expanding unresolved macros into [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err +### Name Resolution + Notice that name resolution is involved here: we need to resolve imports and macro names in the above algorithm. This is done in [`rustc_resolve::macros`][mresolve], which resolves macro paths, validates @@ -118,28 +122,6 @@ chapter](./name-resolution.md). [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html -Here are some other notable data structures involved in expansion and integration: -- [`Resolver`] - a trait used to break crate dependencies. This allows the - resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and - pretty much everything else depending on [`rustc_ast`]. -- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion - infrastructure in the process of its work -- [`Annotatable`] - a piece of AST that can be an attribute target, almost same - thing as AstFragment except for types and patterns that can be produced by - macros but cannot be annotated with attributes -- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a - different `AstFragment` depending on its [`AstFragmentKind`] - item, - or expression, or pattern etc. - -[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html -[`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html -[`Resolver`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.Resolver.html -[`ExtCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExtCtxt.html -[`ExpansionData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExpansionData.html -[`Annotatable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.Annotatable.html -[`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html -[`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html - ### Eager Expansion _Eager expansion_ means that we expand the arguments of a macro invocation @@ -169,6 +151,30 @@ performs a subset of the things that lazy (normal) expansion. It is done by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to whole crate, like we normally do). +### Other Data Structures + +Here are some other notable data structures involved in expansion and integration: +- [`Resolver`] - a trait used to break crate dependencies. This allows the + resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and + pretty much everything else depending on [`rustc_ast`]. +- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion + infrastructure in the process of its work +- [`Annotatable`] - a piece of AST that can be an attribute target, almost same + thing as AstFragment except for types and patterns that can be produced by + macros but cannot be annotated with attributes +- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a + different `AstFragment` depending on its [`AstFragmentKind`] - item, + or expression, or pattern etc. + +[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html +[`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html +[`Resolver`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.Resolver.html +[`ExtCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExtCtxt.html +[`ExpansionData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.ExpansionData.html +[`Annotatable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.Annotatable.html +[`MacResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.MacResult.html +[`AstFragmentKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragmentKind.html + ## Hygiene and Hierarchies If you have ever used C/C++ preprocessor macros, you know that there are some From de9a91636046ad5ce89d4c996a150fa3eee37eea Mon Sep 17 00:00:00 2001 From: Who? Me?! Date: Fri, 8 May 2020 09:36:10 -0500 Subject: [PATCH 22/22] Typo Co-authored-by: Chris Simpkins --- src/macro-expansion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index ea1240fc4..7961d0cf1 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -408,7 +408,7 @@ in [`rustc_span::hygiene`][hy]. ## Producing Macro Output Above, we saw how the output of a macro is integrated into the AST for a crate, -and we also saw how th e hygiene data for a crate is generated. But how do we +and we also saw how the hygiene data for a crate is generated. But how do we actually produce the output of a macro? It depends on the type of macro. There are two types of macros in Rust: