Skip to content

Commit 23b90ff

Browse files
committed
Align lexical syntax between docs and spec
Scala 3 changes compared to the existing Scala 2 spec: - Reusing alphaid in the definition of plainid (this does not change its meaning) - Addition of quoteId and spliceId - Correctly specifying the use of _ in numeric literals. - Dropping symbolLiteral Scala 2 changes compared to the existing Scala 3 spec: - Various refactorings - Specifying the new Unicode escape handling stuff, this was already implemented in Scala 3 but not part of syntax.md (see scala#8480).
1 parent 98fec4c commit 23b90ff

File tree

5 files changed

+98
-102
lines changed

5 files changed

+98
-102
lines changed

docs/_docs/internals/syntax.md

Lines changed: 33 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -20,51 +20,46 @@ productions map to AST nodes.
2020
The following description of Scala tokens uses literal characters `‘c’` when
2121
referring to the ASCII fragment `\u0000``\u007F`.
2222

23-
_Unicode escapes_ are used to represent the [Unicode character](https://www.w3.org/International/articles/definitions-characters/) with the given
24-
hexadecimal code:
25-
26-
```ebnf
27-
UnicodeEscape ::= ‘\’ ‘u’ {‘u’} hexDigit hexDigit hexDigit hexDigit
28-
hexDigit ::= ‘0’ | ... | ‘9’ | ‘A’ | ... | ‘F’ | ‘a’ | ... | ‘f’
29-
```
30-
31-
Informal descriptions are typeset as `“some comment”`.
32-
3323
## Lexical Syntax
3424

35-
The lexical syntax of Scala is given by the following grammar in EBNF
36-
form.
25+
The lexical syntax of Scala is given by the following grammar in EBNF form:
3726

3827
```ebnf
3928
whiteSpace ::= ‘\u0020’ | ‘\u0009’ | ‘\u000D’ | ‘\u000A’
40-
upper ::= ‘A’ | ... | ‘Z’ | ‘\$’ | ‘_’ “... and Unicode category Lu”
41-
lower ::= ‘a’ | ... | ‘z’ “... and Unicode category Ll”
42-
letter ::= upper | lower “... and Unicode categories Lo, Lt, Lm, Nl”
29+
upper ::= ‘A’ | ... | ‘Z’ | ‘$’ and any character in Unicode categories Lu, Lt or Nl,
30+
and any character in Unicode categories Lo and Lm that doesn't have
31+
contributory property Other_Lowercase
32+
lower ::= ‘a’ | ... | ‘z’ | ‘_’ and any character in Unicode category Ll,
33+
and any character in Unicode categories Lo or Lm that has contributory
34+
property Other_Lowercase
35+
letter ::= upper | lower
4336
digit ::= ‘0’ | ... | ‘9’
4437
paren ::= ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’
4538
delim ::= ‘`’ | ‘'’ | ‘"’ | ‘.’ | ‘;’ | ‘,’
4639
opchar ::= ‘!’ | ‘#’ | ‘%’ | ‘&’ | ‘*’ | ‘+’ | ‘-’ | ‘/’ | ‘:’ |
4740
‘<’ | ‘=’ | ‘>’ | ‘?’ | ‘@’ | ‘\’ | ‘^’ | ‘|’ | ‘~’
48-
“... and Unicode categories Sm, So”
49-
printableChar ::= “all characters in [\u0020, \u007E] inclusive”
41+
and any character in Unicode categories Sm or So
42+
printableChar ::= all characters in [\u0020, \u007E] inclusive
43+
UnicodeEscape ::= ‘\’ ‘u’ {‘u’} hexDigit hexDigit hexDigit hexDigit
44+
hexDigit ::= ‘0’ | ... | ‘9’ | ‘A’ | ... | ‘F’ | ‘a’ | ... | ‘f’
5045
charEscapeSeq ::= ‘\’ (‘b’ | ‘t’ | ‘n’ | ‘f’ | ‘r’ | ‘"’ | ‘'’ | ‘\’)
46+
escapeSeq ::= UnicodeEscape | charEscapeSeq
5147
5248
op ::= opchar {opchar}
5349
varid ::= lower idrest
54-
alphaid ::= upper idrest
55-
| varid
50+
boundvarid ::= varid
51+
| ‘`’ varid ‘`’
5652
plainid ::= alphaid
5753
| op
5854
id ::= plainid
59-
| ‘`’ { charNoBackQuoteOrNewline | UnicodeEscape | charEscapeSeq } ‘`’
55+
| ‘`’ { charNoBackQuoteOrNewline | escapeSeq } ‘`’
6056
idrest ::= {letter | digit} [‘_’ op]
6157
quoteId ::= ‘'’ alphaid
6258
spliceId ::= ‘$’ alphaid ;
6359
6460
integerLiteral ::= (decimalNumeral | hexNumeral) [‘L’ | ‘l’]
65-
decimalNumeral ::= ‘0’ | nonZeroDigit [{digit | ‘_’} digit]
61+
decimalNumeral ::= ‘0’ | digit [{digit | ‘_’} digit]
6662
hexNumeral ::= ‘0’ (‘x’ | ‘X’) hexDigit [{hexDigit | ‘_’} hexDigit]
67-
nonZeroDigit ::= ‘1’ | ... | ‘9’
6863
6964
floatingPointLiteral
7065
::= [decimalNumeral] ‘.’ digit [{digit | ‘_’} digit] [exponentPart] [floatType]
@@ -75,25 +70,25 @@ floatType ::= ‘F’ | ‘f’ | ‘D’ | ‘d’
7570
7671
booleanLiteral ::= ‘true’ | ‘false’
7772
78-
characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
73+
characterLiteral ::= ‘'’ (charNoQuoteOrNewline | escapeSeq) ‘'’
7974
8075
stringLiteral ::= ‘"’ {stringElement} ‘"’
8176
| ‘"""’ multiLineChars ‘"""’
82-
stringElement ::= printableChar \ (‘"’ | ‘\’)
83-
| UnicodeEscape
84-
| charEscapeSeq
85-
multiLineChars ::= {[‘"’] [‘"’] char \ ‘"’} {‘"’}
86-
processedStringLiteral
87-
::= alphaid ‘"’ {[‘\’] processedStringPart | ‘\\’ | ‘\"’} ‘"’
88-
| alphaid ‘"""’ {[‘"’] [‘"’] char \ (‘"’ | ‘$’) | escape} {‘"’} ‘"""’
89-
processedStringPart
77+
stringElement ::= charNoDoubleQuoteOrNewline
78+
| escapeSeq
79+
multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}
80+
81+
interpolatedString
82+
::= alphaid ‘"’ {[‘\’] interpolatedStringPart | ‘\\’ | ‘\"’} ‘"’
83+
| alphaid ‘"""’ {[‘"’] [‘"’] char \ (‘"’ | ‘\$’) | escape} {‘"’} ‘"""’
84+
interpolatedStringPart
9085
::= printableChar \ (‘"’ | ‘$’ | ‘\’) | escape
91-
escape ::= ‘$$’
92-
| ‘$’ letter { letter | digit }
93-
| ‘{’ Block [‘;’ whiteSpace stringFormat whiteSpace] ‘}’
94-
stringFormat ::= {printableChar \ (‘"’ | ‘}’ | ‘ ’ | ‘\t’ | ‘\n’)}
95-
96-
symbolLiteral ::= ‘'’ plainid // until 2.13
86+
escape ::= ‘\$\$’
87+
| ‘\$"’
88+
| ‘\$’ alphaid
89+
| ‘\$’ BlockExpr
90+
alphaid ::= upper idrest
91+
| varid
9792
9893
comment ::= ‘/*’ “any sequence of characters; nested comments are allowed” ‘*/’
9994
| ‘//’ “any sequence of characters up to end of line”
@@ -159,7 +154,7 @@ SimpleLiteral ::= [‘-’] integerLiteral
159154
| characterLiteral
160155
| stringLiteral
161156
Literal ::= SimpleLiteral
162-
| processedStringLiteral
157+
| interpolatedStringLiteral
163158
| symbolLiteral
164159
| ‘null’
165160

docs/_docs/reference/syntax.md

Lines changed: 34 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -21,51 +21,48 @@ productions map to AST nodes.
2121
The following description of Scala tokens uses literal characters `‘c’` when
2222
referring to the ASCII fragment `\u0000``\u007F`.
2323

24-
_Unicode escapes_ are used to represent the [Unicode character](https://www.w3.org/International/articles/definitions-characters/) with the given
25-
hexadecimal code:
26-
27-
```
28-
UnicodeEscape ::= ‘\’ ‘u’ {‘u’} hexDigit hexDigit hexDigit hexDigit
29-
hexDigit ::= ‘0’ | ... | ‘9’ | ‘A’ | ... | ‘F’ | ‘a’ | ... | ‘f’
30-
```
31-
3224
Informal descriptions are typeset as `“some comment”`.
3325

3426
## Lexical Syntax
3527

36-
The lexical syntax of Scala is given by the following grammar in EBNF
37-
form.
28+
The lexical syntax of Scala is given by the following grammar in EBNF form:
3829

39-
```
30+
```ebnf
4031
whiteSpace ::= ‘\u0020’ | ‘\u0009’ | ‘\u000D’ | ‘\u000A’
41-
upper ::= ‘A’ | ... | ‘Z’ | ‘\$’ | ‘_’ “... and Unicode category Lu”
42-
lower ::= ‘a’ | ... | ‘z’ “... and Unicode category Ll”
43-
letter ::= upper | lower “... and Unicode categories Lo, Lt, Nl”
32+
upper ::= ‘A’ | ... | ‘Z’ | ‘$’ and any character in Unicode categories Lu, Lt or Nl,
33+
and any character in Unicode categories Lo and Lm that doesn't have
34+
contributory property Other_Lowercase
35+
lower ::= ‘a’ | ... | ‘z’ | ‘_’ and any character in Unicode category Ll,
36+
and any character in Unicode categories Lo or Lm that has contributory
37+
property Other_Lowercase
38+
letter ::= upper | lower
4439
digit ::= ‘0’ | ... | ‘9’
4540
paren ::= ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’
4641
delim ::= ‘`’ | ‘'’ | ‘"’ | ‘.’ | ‘;’ | ‘,’
4742
opchar ::= ‘!’ | ‘#’ | ‘%’ | ‘&’ | ‘*’ | ‘+’ | ‘-’ | ‘/’ | ‘:’ |
4843
‘<’ | ‘=’ | ‘>’ | ‘?’ | ‘@’ | ‘\’ | ‘^’ | ‘|’ | ‘~’
49-
“... and Unicode categories Sm, So”
50-
printableChar ::= “all characters in [\u0020, \u007E] inclusive”
44+
and any character in Unicode categories Sm or So
45+
printableChar ::= all characters in [\u0020, \u007E] inclusive
46+
UnicodeEscape ::= ‘\’ ‘u’ {‘u’} hexDigit hexDigit hexDigit hexDigit
47+
hexDigit ::= ‘0’ | ... | ‘9’ | ‘A’ | ... | ‘F’ | ‘a’ | ... | ‘f’
5148
charEscapeSeq ::= ‘\’ (‘b’ | ‘t’ | ‘n’ | ‘f’ | ‘r’ | ‘"’ | ‘'’ | ‘\’)
49+
escapeSeq ::= UnicodeEscape | charEscapeSeq
5250
5351
op ::= opchar {opchar}
5452
varid ::= lower idrest
55-
alphaid ::= upper idrest
56-
| varid
53+
boundvarid ::= varid
54+
| ‘`’ varid ‘`’
5755
plainid ::= alphaid
5856
| op
5957
id ::= plainid
60-
| ‘`’ { charNoBackQuoteOrNewline | UnicodeEscape | charEscapeSeq } ‘`’
58+
| ‘`’ { charNoBackQuoteOrNewline | escapeSeq } ‘`’
6159
idrest ::= {letter | digit} [‘_’ op]
6260
quoteId ::= ‘'’ alphaid
6361
spliceId ::= ‘$’ alphaid ;
6462
6563
integerLiteral ::= (decimalNumeral | hexNumeral) [‘L’ | ‘l’]
66-
decimalNumeral ::= ‘0’ | nonZeroDigit [{digit | ‘_’} digit]
64+
decimalNumeral ::= ‘0’ | digit [{digit | ‘_’} digit]
6765
hexNumeral ::= ‘0’ (‘x’ | ‘X’) hexDigit [{hexDigit | ‘_’} hexDigit]
68-
nonZeroDigit ::= ‘1’ | ... | ‘9’
6966
7067
floatingPointLiteral
7168
::= [decimalNumeral] ‘.’ digit [{digit | ‘_’} digit] [exponentPart] [floatType]
@@ -76,25 +73,25 @@ floatType ::= ‘F’ | ‘f’ | ‘D’ | ‘d’
7673
7774
booleanLiteral ::= ‘true’ | ‘false’
7875
79-
characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
76+
characterLiteral ::= ‘'’ (charNoQuoteOrNewline | escapeSeq) ‘'’
8077
8178
stringLiteral ::= ‘"’ {stringElement} ‘"’
8279
| ‘"""’ multiLineChars ‘"""’
83-
stringElement ::= printableChar \ (‘"’ | ‘\’)
84-
| UnicodeEscape
85-
| charEscapeSeq
86-
multiLineChars ::= {[‘"’] [‘"’] char \ ‘"’} {‘"’}
87-
processedStringLiteral
88-
::= alphaid ‘"’ {[‘\’] processedStringPart | ‘\\’ | ‘\"’} ‘"’
89-
| alphaid ‘"""’ {[‘"’] [‘"’] char \ (‘"’ | ‘$’) | escape} {‘"’} ‘"""’
90-
processedStringPart
80+
stringElement ::= charNoDoubleQuoteOrNewline
81+
| escapeSeq
82+
multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}
83+
84+
interpolatedString
85+
::= alphaid ‘"’ {[‘\’] interpolatedStringPart | ‘\\’ | ‘\"’} ‘"’
86+
| alphaid ‘"""’ {[‘"’] [‘"’] char \ (‘"’ | ‘\$’) | escape} {‘"’} ‘"""’
87+
interpolatedStringPart
9188
::= printableChar \ (‘"’ | ‘$’ | ‘\’) | escape
92-
escape ::= ‘$$’
93-
| ‘$’ letter { letter | digit }
94-
| ‘{’ Block [‘;’ whiteSpace stringFormat whiteSpace] ‘}’
95-
stringFormat ::= {printableChar \ (‘"’ | ‘}’ | ‘ ’ | ‘\t’ | ‘\n’)}
96-
97-
symbolLiteral ::= ‘'’ plainid // until 2.13
89+
escape ::= ‘\$\$’
90+
| ‘\$"’
91+
| ‘\$’ alphaid
92+
| ‘\$’ BlockExpr
93+
alphaid ::= upper idrest
94+
| varid
9895
9996
comment ::= ‘/*’ “any sequence of characters; nested comments are allowed” ‘*/’
10097
| ‘//’ “any sequence of characters up to end of line”
@@ -163,7 +160,7 @@ SimpleLiteral ::= [‘-’] integerLiteral
163160
| characterLiteral
164161
| stringLiteral
165162
Literal ::= SimpleLiteral
166-
| processedStringLiteral
163+
| interpolatedStringLiteral
167164
| symbolLiteral
168165
| ‘null’
169166

docs/_spec/01-lexical-syntax.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,9 @@ op ::= opchar {opchar}
2727
varid ::= lower idrest
2828
boundvarid ::= varid
2929
| ‘`’ varid ‘`’
30-
plainid ::= upper idrest
31-
| varid
30+
alphaid ::= upper idrest
31+
| varid
32+
plainid ::= alphaid
3233
| op
3334
id ::= plainid
3435
| ‘`’ { charNoBackQuoteOrNewline | escapeSeq } ‘`’
@@ -282,8 +283,8 @@ Literal ::= [‘-’] integerLiteral
282283
```ebnf
283284
integerLiteral ::= (decimalNumeral | hexNumeral)
284285
[‘L’ | ‘l’]
285-
decimalNumeral ::= digit {digit}
286-
hexNumeral ::= ‘0’ (‘x’ | ‘X’) hexDigit {hexDigit}
286+
decimalNumeral ::= ‘0’ | digit [{digit | ‘_’} digit]
287+
hexNumeral ::= ‘0’ (‘x’ | ‘X’) hexDigit [{hexDigit | ‘_’} hexDigit]
287288
```
288289

289290
Values of type `Int` are all integer numbers between $-2\^{31}$ and $2\^{31}-1$, inclusive.
@@ -312,12 +313,11 @@ The digits of a numeric literal may be separated by arbitrarily many underscores
312313
### Floating Point Literals
313314
314315
```ebnf
315-
floatingPointLiteral ::= digit {digit} ‘.’ digit {digit} [exponentPart] [floatType]
316-
| ‘.’ digit {digit} [exponentPart] [floatType]
317-
| digit {digit} exponentPart [floatType]
318-
| digit {digit} [exponentPart] floatType
319-
exponentPart ::= (‘E| ‘e’) [‘+|-’] digit {digit}
320-
floatType ::=F| ‘f’ |D| ‘d’
316+
floatingPointLiteral
317+
::= [decimalNumeral] ‘.’ digit [{digit | ‘_’} digit] [exponentPart] [floatType]
318+
| decimalNumeral exponentPart [floatType]
319+
| decimalNumeral floatType
320+
exponentPart ::= (‘E| ‘e’) [‘+|-’] digit [{digit | ‘_’} digit]
321321
```
322322
323323
Floating point literals are of type `Float` when followed by a floating point type suffix `F` or `f`, and are of type `Double` otherwise.
@@ -448,7 +448,7 @@ Inside an interpolated string none of the usual escape characters are interprete
448448
Note that the sequence `\"` does not close a normal string literal (enclosed in single quotes).
449449
450450
There are three forms of dollar sign escape.
451-
The most general form encloses an expression in `${` and `}`, i.e. `${expr}`.
451+
The most general form encloses an expression in `${` and `}`, i.e. `${expr}`.
452452
The expression enclosed in the braces that follow the leading `$` character is of syntactical category BlockExpr.
453453
Hence, it can contain multiple statements, and newlines are significant.
454454
Single ‘$’-signs are not permitted in isolation in an interpolated string.

docs/_spec/13-syntax-summary.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ chapter: 13
88

99
The following descriptions of Scala tokens uses literal characters `‘c’` when referring to the ASCII fragment `\u0000``\u007F`.
1010

11+
Informal descriptions are typeset as `“some comment”`.
12+
1113
## Lexical Syntax
1214

1315
The lexical syntax of Scala is given by the following grammar in EBNF form:
@@ -32,27 +34,30 @@ UnicodeEscape ::= ‘\’ ‘u’ {‘u’} hexDigit hexDigit hexDigit hexDi
3234
hexDigit ::= ‘0’ | ... | ‘9’ | ‘A’ | ... | ‘F’ | ‘a’ | ... | ‘f’
3335
charEscapeSeq ::= ‘\’ (‘b’ | ‘t’ | ‘n’ | ‘f’ | ‘r’ | ‘"’ | ‘'’ | ‘\’)
3436
escapeSeq ::= UnicodeEscape | charEscapeSeq
37+
3538
op ::= opchar {opchar}
3639
varid ::= lower idrest
3740
boundvarid ::= varid
3841
| ‘`’ varid ‘`’
39-
plainid ::= upper idrest
42+
alphaid ::= upper idrest
4043
| varid
44+
plainid ::= alphaid
4145
| op
4246
id ::= plainid
4347
| ‘`’ { charNoBackQuoteOrNewline | escapeSeq } ‘`’
4448
idrest ::= {letter | digit} [‘_’ op]
49+
quoteId ::= ‘'’ alphaid
50+
spliceId ::= ‘$’ alphaid ;
4551
4652
integerLiteral ::= (decimalNumeral | hexNumeral) [‘L’ | ‘l’]
47-
decimalNumeral ::= digit {digit}
48-
hexNumeral ::= ‘0’ (‘x’ | ‘X’) hexDigit {hexDigit}
53+
decimalNumeral ::= ‘0’ | digit [{digit | ‘_’} digit]
54+
hexNumeral ::= ‘0’ (‘x’ | ‘X’) hexDigit [{hexDigit | ‘_’} hexDigit]
4955
5056
floatingPointLiteral
51-
::= digit {digit} ‘.’ digit {digit} [exponentPart] [floatType]
52-
| ‘.’ digit {digit} [exponentPart] [floatType]
53-
| digit {digit} exponentPart [floatType]
54-
| digit {digit} [exponentPart] floatType
55-
exponentPart ::= (‘E’ | ‘e’) [‘+’ | ‘-’] digit {digit}
57+
::= [decimalNumeral] ‘.’ digit [{digit | ‘_’} digit] [exponentPart] [floatType]
58+
| decimalNumeral exponentPart [floatType]
59+
| decimalNumeral floatType
60+
exponentPart ::= (‘E’ | ‘e’) [‘+’ | ‘-’] digit [{digit | ‘_’} digit]
5661
floatType ::= ‘F’ | ‘f’ | ‘D’ | ‘d’
5762
5863
booleanLiteral ::= ‘true’ | ‘false’
@@ -74,10 +79,6 @@ escape ::= ‘\$\$’
7479
| ‘\$"’
7580
| ‘\$’ alphaid
7681
| ‘\$’ BlockExpr
77-
alphaid ::= upper idrest
78-
| varid
79-
80-
symbolLiteral ::= ‘'’ plainid
8182
8283
comment ::= ‘/*’ “any sequence of characters; nested comments are allowed” ‘*/’
8384
| ‘//’ “any sequence of characters up to end of line”

docs/_spec/TODOreference/syntax.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ productions map to AST nodes.
1818
1919
-->
2020

21+
<!-- Lexical syntax already merged into _spec. -->
22+
<!--
2123
The following description of Scala tokens uses literal characters `‘c’` when
2224
referring to the ASCII fragment `\u0000` – `\u007F`.
2325
@@ -84,10 +86,10 @@ stringElement ::= printableChar \ (‘"’ | ‘\’)
8486
| UnicodeEscape
8587
| charEscapeSeq
8688
multiLineChars ::= {[‘"’] [‘"’] char \ ‘"’} {‘"’}
87-
processedStringLiteral
88-
::= alphaid ‘"’ {[‘\’] processedStringPart | ‘\\’ | ‘\"’} ‘"’
89+
interpolatedStringLiteral
90+
::= alphaid ‘"’ {[‘\’] interpolatedStringPart | ‘\\’ | ‘\"’} ‘"’
8991
| alphaid ‘"""’ {[‘"’] [‘"’] char \ (‘"’ | ‘$’) | escape} {‘"’} ‘"""’
90-
processedStringPart
92+
interpolatedStringPart
9193
::= printableChar \ (‘"’ | ‘$’ | ‘\’) | escape
9294
escape ::= ‘$$’
9395
| ‘$’ letter { letter | digit }
@@ -102,6 +104,7 @@ comment ::= ‘/*’ “any sequence of characters; nested comments ar
102104
nl ::= “new line character”
103105
semi ::= ‘;’ | nl {nl}
104106
```
107+
-->
105108

106109
## Optional Braces
107110

@@ -163,7 +166,7 @@ SimpleLiteral ::= [‘-’] integerLiteral
163166
| characterLiteral
164167
| stringLiteral
165168
Literal ::= SimpleLiteral
166-
| processedStringLiteral
169+
| interpolatedStringLiteral
167170
| symbolLiteral
168171
| ‘null’
169172

0 commit comments

Comments
 (0)