1
1
# Lexing and Parsing
2
2
3
- The very first thing the compiler does is take the program (in Unicode) and
4
- transmute it into a data format the compiler can work with more conveniently
5
- than strings. This happens in two stages: Lexing and Parsing.
3
+ The very first thing the compiler does is take the program (in UTF-8 Unicode text)
4
+ and turn it into a data format the compiler can work with more conveniently than strings.
5
+ This happens in two stages: Lexing and Parsing.
6
6
7
7
1 . _ Lexing_ takes strings and turns them into streams of [ tokens] . For
8
8
example, ` foo.bar + buz ` would be turned into the tokens ` foo ` , ` . ` , ` bar ` ,
@@ -13,38 +13,36 @@ than strings. This happens in two stages: Lexing and Parsing.
13
13
14
14
2 . _ Parsing_ takes streams of tokens and turns them into a structured form
15
15
which is easier for the compiler to work with, usually called an [ * Abstract
16
- Syntax Tree* (` AST ` )] [ ast ] .
16
+ Syntax Tree* (AST)] [ ast ] .
17
17
18
18
19
- An ` AST ` mirrors the structure of a Rust program in memory, using a ` Span ` to
20
- link a particular ` AST ` node back to its source text. The ` AST ` is defined in
19
+ An AST mirrors the structure of a Rust program in memory, using a ` Span ` to
20
+ link a particular AST node back to its source text. The AST is defined in
21
21
[ ` rustc_ast ` ] [ rustc_ast ] , along with some definitions for tokens and token
22
- streams, data structures/` trait ` s for mutating ` AST ` s , and shared definitions for
23
- other ` AST ` -related parts of the compiler (like the lexer and
24
- ` macro ` -expansion).
22
+ streams, data structures/traits for mutating ASTs , and shared definitions for
23
+ other AST-related parts of the compiler (like the lexer and
24
+ macro-expansion).
25
25
26
26
The lexer is developed in [ ` rustc_lexer ` ] [ lexer ] .
27
27
28
28
The parser is defined in [ ` rustc_parse ` ] [ rustc_parse ] , along with a
29
29
high-level interface to the lexer and some validation routines that run after
30
- ` macro ` expansion. In particular, the [ ` rustc_parse::parser ` ] [ parser ] contains
30
+ macro expansion. In particular, the [ ` rustc_parse::parser ` ] [ parser ] contains
31
31
the parser implementation.
32
32
33
33
The main entrypoint to the parser is via the various ` parse_* ` functions and others in
34
34
[ rustc_parse] [ rustc_parse ] . They let you do things like turn a [ ` SourceFile ` ] [ sourcefile ]
35
35
(e.g. the source in a single file) into a token stream, create a parser from
36
- the token stream, and then execute the parser to get a [ ` Crate ` ] (the root ` AST `
36
+ the token stream, and then execute the parser to get a [ ` Crate ` ] (the root AST
37
37
node).
38
38
39
- To minimize the amount of copying that is done, both [ ` StringReader ` ] and
40
- [ ` Parser ` ] have lifetimes which bind them to the parent [ ` ParseSess ` ] . This
41
- contains all the information needed while parsing, as well as the [ ` SourceMap ` ]
42
- itself.
39
+ To minimize the amount of copying that is done,
40
+ both [ ` StringReader ` ] and [ ` Parser ` ] have lifetimes which bind them to the parent [ ` ParseSess ` ] .
41
+ This contains all the information needed while parsing, as well as the [ ` SourceMap ` ] itself.
43
42
44
- Note that while parsing, we may encounter ` macro ` definitions or invocations. We
45
- set these aside to be expanded (see [ Macro Expansion] ( ./macro-expansion.md ) ).
46
- Expansion itself may require parsing the output of a ` macro ` , which may reveal
47
- more ` macro ` s to be expanded, and so on.
43
+ Note that while parsing, we may encounter macro definitions or invocations.
44
+ We set these aside to be expanded (see [ Macro Expansion] ( ./macro-expansion.md ) ).
45
+ Expansion itself may require parsing the output of a macro, which may reveal more macros to be expanded, and so on.
48
46
49
47
## More on Lexical Analysis
50
48
0 commit comments