Skip to content

Latest commit

 

History

History
161 lines (129 loc) · 7.22 KB

macro-expansion.md

File metadata and controls

161 lines (129 loc) · 7.22 KB

Macro expansion

Macro expansion happens during parsing. rustc has two parsers, in fact: the normal Rust parser, and the macro parser. During the parsing phase, the normal Rust parser will set aside the contents of macros and their invokations. Later, before name resolution, macros are expanded using these portions of the code. The macro parser, in turn, may call the normal Rust parser when it needs to bind a metavariable (e.g. $my_expr) while parsing the contents of a macro invocation. The code for macro expansion is in src/libsyntax/ext/tt/. This chapter aims to explain how macro expansion works.

Example

It's helpful to have an example to refer to. For the remainder of this chapter, whenever we refer to the "example definition", we mean the following:

macro_rules! printer {
    (print $mvar:ident) => {
        println!("{}", $mvar);
    }
    (print twice $mvar:ident) => {
        println!("{}", $mvar);
        println!("{}", $mvar);
    }
}

$mvar is called a metavariable. Unlike normal variables, rather than binding to a value in a computation, a metavariable binds at compile time to a tree of tokens. A token is a single "unit" of the grammar, such as an identifier (e.g., foo) or punctuation (e.g., =>). There are also other special tokens, such as EOF, which indicates that there are no more tokens. Token trees resulting from paired parentheses-like characters ((...), [...], and {...}) -- they include the open and close and all the tokens in between (we do require that parentheses-like characters be balanced). Having macro expansion operate on token streams rather than the raw bytes of a source file abstracts away a lot of complexity. The macro expander (and much of the rest of the compiler) doesn't really care that much about the exact line and column of some syntactic construct in the code; it cares about what constructs are used in the code. Using tokens allows us to care about what without worrying about where. For more information about tokens, see the Parsing chapter of this book.

Whenever we refer to the "example invocation", we mean the following snippet:

printer!(print foo); // Assume `foo` is a variable defined somewhere else...

The process of expanding the macro invocation into the syntax tree println!("{}", foo) and then expanding that into a call to Display::fmt is called macro expansion, and it is the topic of this chapter.

The macro parser

There are two parts to macro expansion: parsing the definition and parsing the invocations. Interestingly, both are done by the macro parser.

Basically, the macro parser is like an NFA-based regex parser. It uses an algorithm similar in spirit to the Earley parsing algorithm. The macro parser is defined in src/libsyntax/ext/tt/macro_parser.rs.

The interface of the macro parser is as follows (this is slightly simplified):

fn parse(
    sess: ParserSession,
    tts: TokenStream,
    ms: &[TokenTree]
) -> NamedParseResult

In this interface:

  • sess is a "parsing session", which keeps track of some metadata. Most notably, this is used to keep track of errors that are generated so they can be reported to the user.
  • tts is a stream of tokens. The macro parser's job is to consume the raw stream of tokens and output a binding of metavariables to corresponding token trees.
  • ms a matcher. This is a sequence of token trees that we want to match tts against.

In the analogy of a regex parser, tts is the input and we are matching it against the pattern ms. Using our examples, tts could be the stream of tokens containing the inside of the example invocation print foo, while ms might be the sequence of token (trees) print $mvar:ident.

The output of the parser is a NamedParserResult, which indicates which of three cases has occured:

  • Success: tts matches the given matcher ms, and we have produced a binding from metavariables to the corresponding token trees.
  • Failure: tts does not match ms. This results in an error message such as "No rule expected token blah".
  • Error: some fatal error has occured in the parser. For example, this happens if there are more than one pattern match, since that indicates the macro is ambiguous.

The full interface is defined here.

The macro parser does pretty much exactly the same as a normal regex parser with one exception: in order to parse different types of metavariables, such as ident, block, expr, etc., the macro parser must sometimes call back to the normal Rust parser.

As mentioned above, both definitions and invocations of macros are parsed using the macro parser. This is extremely non-intuitive and self-referential. The code to parse macro definitions is in [src/libsyntax/ext/tt/macro_rules.rs][code_mr]. It defines the pattern for matching for a macro definition as $( $lhs:tt => $rhs:tt );+. In other words, a macro_rules defintion should have in its body at least one occurence of a token tree followed by => followed by another token tree. When the compiler comes to a macro_rules definition, it uses this pattern to match the two token trees per rule in the definition of the macro using the macro parser itself. In our example definition, the metavariable $lhs would match the patterns of both arms: (print $mvar:ident) and (print twice $mvar:ident). And $rhs would match the bodies of both arms: { println!("{}", $mvar); } and { println!("{}", $mvar); println!("{}", $mvar); }. The parser would keep this knowledge around for when it needs to expand a macro invocation.

When the compiler comes to a macro invocation, it parses that invocation using the same NFA-based macro parser that is described above. However, the matcher used is the first token tree ($lhs) extracted from the arms of the macro definition. Using our example, we would try to match the token stream print foo from the invocation against the matchers print $mvar:ident and print twice $mvar:ident that we previously extracted from the definition. The algorithm is exactly the same, but when the macro parser comes to a place in the current matcher where it needs to match a non-terminal (e.g. $mvar:ident), it calls back to the normal Rust parser to get the contents of that non-terminal. In this case, the Rust parser would look for an ident token, which it finds (foo) and returns to the macro parser. Then, the macro parser proceeds in parsing as normal. Also, note that exactly one of the matchers from the various arms should match the invocation (otherwise, the macro is ambiguous).

For more information about the macro parser's implementation, see the comments in src/libsyntax/ext/tt/macro_parser.rs.

Hygiene

TODO

Procedural Macros

TODO

Custom Derive

TODO