Skip to content

Implement tokenization errors as per spec. #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 106 commits into from
Jun 12, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
222b75c
Add control-or-undefined-character-in-input-stream parse error.
inikulin Mar 25, 2017
3f10d5b
Add non-unicode-character-in-input-stream parse error.
inikulin Mar 25, 2017
5a3b850
Add self-closing-non-void-html-element error.
inikulin Mar 29, 2017
203f0ef
Add end-tag-with-attributes error.
inikulin Mar 29, 2017
7b56b75
Add self-closing-end-tag error.
inikulin Mar 30, 2017
91f17a6
Add unexpected-null-character error.
inikulin Mar 30, 2017
7989d84
Add unexpected-null-character parse error in in RCDATA, RAWTEXT, PLA…
inikulin Apr 1, 2017
6e4821a
Add Tag open state errors
inikulin Apr 1, 2017
c8d5a02
Add End tag open state parse errors.
inikulin Apr 3, 2017
a1938c2
Add Markup declaration open state parse errors.
inikulin Apr 3, 2017
2cada34
Add Script data escaped state parse errors.
inikulin Apr 6, 2017
23176e3
Add Script data escaped dash state parse errors.
inikulin Apr 6, 2017
8c72094
Add Script data escaped dash dash state errors.
inikulin Apr 6, 2017
586f2dd
Add Script data double escaped state errors.
inikulin Apr 6, 2017
5fc480e
Adding tests for Tag Name parse errors
Apr 6, 2017
924515f
Adding back old eof error
Apr 7, 2017
f06aecd
Merge pull request #1 from diervo/dval/parseErrorTagName
inikulin Apr 7, 2017
dcd7002
Adding parser errors for before attribute name state
Apr 10, 2017
c9436ca
Add Comment less-than sign bang dash dash state errors.
inikulin Apr 11, 2017
2918163
Add Comment start state errors.
inikulin Apr 11, 2017
ad3c75e
Add Comment start dash state errors.
inikulin Apr 11, 2017
a74e16a
Add Comment state errors.
inikulin Apr 11, 2017
e7d3d7d
Adding parser errors for attibute name state
Apr 11, 2017
6e62741
Merge pull request #2 from HTMLParseErrorWG/comments-parse-errors
inikulin Apr 11, 2017
730d1fd
Add Comment end dash state errors.
inikulin Apr 12, 2017
95be487
Add Comment end state errors.
inikulin Apr 12, 2017
1bb47c3
Add Comment end bang state errors.
inikulin Apr 12, 2017
ba7425f
Adding parser errors for after attibute name state
Apr 12, 2017
f7d0432
Merge pull request #3 from HTMLParseErrorWG/comment-parse-errors2
inikulin Apr 12, 2017
8179234
Generalizing error naming
Apr 13, 2017
e091f33
Merge pull request #4 from diervo/attrErrors
inikulin Apr 13, 2017
def9b41
Revert "Adding parser errors for attibute name states"
inikulin Apr 13, 2017
f2b6762
Merge pull request #5 from HTMLParseErrorWG/revert-4-attrErrors
inikulin Apr 13, 2017
2d76815
Revert "Revert "Adding parser errors for attibute name states""
inikulin Apr 13, 2017
7c8dc47
Merge pull request #6 from HTMLParseErrorWG/revert-5-revert-4-attrErrors
inikulin Apr 13, 2017
549e840
Add CDATA section state errors.
inikulin Apr 12, 2017
64b767e
Generalize tag errors. Fix typo.
inikulin Apr 13, 2017
5229aa6
Merge pull request #7 from HTMLParseErrorWG/gh8
inikulin Apr 13, 2017
5e31e92
Adding parser errors for before attribute value state
Apr 13, 2017
db44726
Added parse errors for attribute value doublequoted
Apr 13, 2017
645d9ea
Added parse errors for attribute value single-quoted
Apr 13, 2017
4f4339a
Added parse errors for attribute value unquoted
Apr 13, 2017
ac195da
Added parse errors for attribute value quoted
Apr 13, 2017
53743e1
Added parse errors: Self closing start tag
Apr 13, 2017
3f7b0c9
Generalize eof-in-tag error, Rename attrValue errors (more accurate)
Apr 14, 2017
1a2cf4c
Rename for better semantics: missing-whitespace-between-attributes
Apr 15, 2017
469218f
Merge pull request #8 from diervo/attrValue
inikulin Apr 17, 2017
b9511ae
Add Hexademical character reference start state errors.
inikulin Apr 19, 2017
090cd9c
Add Decimal character reference start state errors.
inikulin Apr 19, 2017
7415267
Add Hexademical character reference state errors.
inikulin Apr 19, 2017
10b8ddd
Add Decimal character reference state errors.
inikulin Apr 19, 2017
a95baa4
Add Numeric character reference end state errors.
inikulin Apr 20, 2017
900f389
Add DOCTYPE state errors.
inikulin Apr 21, 2017
99400c4
Add Before DOCTYPE name state errors.
inikulin Apr 21, 2017
fd209ad
Add DOCTYPE name state errors.
inikulin Apr 21, 2017
1597ba2
Add After DOCTYPE name state errors.
inikulin Apr 21, 2017
4d0d34b
Add Script data double escaped dash state
inikulin Apr 21, 2017
16c9a83
Add Script data double escaped dash dash state errors.
inikulin Apr 21, 2017
e704fa3
Merge pull request #10 from HTMLParseErrorWG/char-ref-errors2
inikulin Apr 25, 2017
9b4a43f
Parser errors: Add character reference state
Apr 14, 2017
7a633a2
Rename parse errors
Apr 17, 2017
12943fb
Fix columns in missing semicolon after character reference errors
inikulin May 4, 2017
919651a
Rename `abrupt-comment` error to `abrupt-closing-of-comment`
inikulin May 4, 2017
c53b1e4
Merge pull request #13 from HTMLParseErrorWG/rename-abrupt-comment
inikulin May 9, 2017
03a008d
Split control and undefined character errors for input stream
inikulin May 9, 2017
7648171
Split numeric character errors.
inikulin May 9, 2017
a6fa878
Fix error code.
inikulin May 9, 2017
2ddfe45
Merge pull request #14 from HTMLParseErrorWG/rebase-and-invalid-chars
inikulin May 10, 2017
2c1695e
Merge pull request #12 from HTMLParseErrorWG/char-refs3
inikulin May 10, 2017
55146bd
Fixing DOCTYPE parse errors
May 1, 2017
704817a
Merge pull request #11 from diervo/doctypeerrors
inikulin May 11, 2017
49d6fa3
Test "block" elements that should close p (#91)
zcorpan Apr 5, 2017
0f0d517
Merge pull request #15 from HTMLParseErrorWG/upstream-rebase
inikulin May 11, 2017
7d4669c
Remove "ParseError" tokens.
inikulin May 11, 2017
ae10a92
Merge pull request #16 from HTMLParseErrorWG/remove-parse-error-token
inikulin May 11, 2017
c7aca56
Rename malformed-comment error
inikulin May 18, 2017
a200b85
Rename `abruption-of-tag-self-closure` error.
inikulin May 18, 2017
a0bb7a1
Merge pull request #17 from HTMLParseErrorWG/markup-decl
inikulin May 21, 2017
89c03c3
Merge pull request #18 from HTMLParseErrorWG/solidus-in-tag
inikulin May 21, 2017
32f67a7
Handle ambiguous ampersand properly
inikulin May 22, 2017
9ff154b
Merge pull request #19 from HTMLParseErrorWG/ambiguous-amp
inikulin May 22, 2017
e3d9d0b
Fix error code
inikulin May 22, 2017
10e9fcc
Merge pull request #20 from HTMLParseErrorWG/ambiguous-amp
inikulin May 22, 2017
3af9408
Change error code for self closing non-void elements
inikulin May 24, 2017
ab33731
Rename self closing end tag error
inikulin May 24, 2017
5bd909b
Rename xml declaration error
inikulin May 24, 2017
bf6f2a1
Rename markup declaration error
inikulin May 24, 2017
9716058
Rename comment in script error
inikulin May 24, 2017
0680cdf
Add missing doctype public identifier error
inikulin May 25, 2017
ced1911
Add missing doctype system identifier error
inikulin May 25, 2017
4d4bab2
Add errors for abrupt doctype identifiers
inikulin May 25, 2017
d1f2720
Merge pull request #21 from HTMLParseErrorWG/review-remarks
inikulin May 25, 2017
5765c84
Rename abrupt closing of comment error
inikulin May 30, 2017
eae4e2d
non-void-element -> non-void-html-element
inikulin May 30, 2017
fa43d3d
Merge pull request #22 from HTMLParseErrorWG/review-fixes2
inikulin May 30, 2017
abf44b5
Add error for duplicate attribute
inikulin May 30, 2017
f7c525f
Merge pull request #23 from HTMLParseErrorWG/review-fixes2
inikulin May 30, 2017
eaeee69
non-unicode-character-in-input-stream -> surrogate-in-input-stream
inikulin May 31, 2017
7b7a220
undefined-character-in-input-stream -> noncharacter-in-input-stream
inikulin May 31, 2017
0ceaf59
undefined-character-reference -> noncharacter-character-reference
inikulin May 31, 2017
7ed9c4f
non-unicode-character-reference -> surrogate-character-reference
inikulin May 31, 2017
8c43e0e
character-reference-outside-unicode-range
inikulin May 31, 2017
b13e570
Merge pull request #24 from HTMLParseErrorWG/review-fixes2
inikulin May 31, 2017
8f5f958
Fix erroneously changed legacy errors
inikulin Jun 8, 2017
7b6415d
Remove ignoreErrorOrder property. Add error format description
inikulin Jun 8, 2017
9ec9f26
Merge pull request #25 from HTMLParseErrorWG/review-fixes2
inikulin Jun 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions tokenizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@ Basic Structure
    "output": [expected_output_tokens],
    "initialStates": [initial_states],
    "lastStartTag": last_start_tag,
    "ignoreErrorOrder": ignore_error_order
"errors": [parse_errors]
    }
]}

Multiple tests per file are allowed simply by adding more objects to the
"tests" list.

Each parse error is an object that contains error `code` and one-based
error location indices: `line` and `col`.

`description`, `input` and `output` are always present. The other values
are optional.

Expand Down Expand Up @@ -65,7 +68,6 @@ tokens are:
["EndTag", name]
["Comment", data]
["Character", data]
"ParseError"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document the new errors property.


`public_id` and `system_id` are either strings or `null`. `correctness`
is either `true` or `false`; `true` corresponds to the force-quirks flag
Expand Down
10 changes: 8 additions & 2 deletions tokenizer/contentModelFlags.test
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,10 @@
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xmp ",
"output":[["Character", "foo"], "ParseError"]},
"output":[["Character", "foo"]],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 10 }
]},

{"description":"End tag closing RCDATA or RAWTEXT (ending with EOF)",
"initialStates":["RCDATA state", "RAWTEXT state"],
Expand All @@ -34,7 +37,10 @@
"initialStates":["RCDATA state", "RAWTEXT state"],
"lastStartTag":"xmp",
"input":"foo</xmp/",
"output":[["Character", "foo"], "ParseError"]},
"output":[["Character", "foo"]],
"errors":[
{ "code": "eof-in-tag", "line": 1, "col": 10 }
]},

{"description":"End tag not closing RCDATA or RAWTEXT (ending with left-angle-bracket)",
"initialStates":["RCDATA state", "RAWTEXT state"],
Expand Down
134 changes: 125 additions & 9 deletions tokenizer/domjs.test
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,114 @@
{
"description":"CR in bogus comment state",
"input":"<?\u000d",
"output":["ParseError", ["Comment", "?\u000a"]]
"output":[["Comment", "?\u000a"]],
"errors":[
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
]
},
{
"description":"CRLF in bogus comment state",
"input":"<?\u000d\u000a",
"output":["ParseError", ["Comment", "?\u000a"]]
"output":[["Comment", "?\u000a"]],
"errors":[
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
]
},
{
"description":"CRLFLF in bogus comment state",
"input":"<?\u000d\u000a\u000a",
"output":["ParseError", ["Comment", "?\u000a\u000a"]]
"output":[["Comment", "?\u000a\u000a"]],
"errors":[
{ "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
]
},
{
"description":"NUL in RCDATA and RAWTEXT",
"description":"NUL in RCDATA, RAWTEXT, PLAINTEXT and Script data",
"doubleEscaped":true,
"initialStates":["RCDATA state", "RAWTEXT state"],
"initialStates":["RCDATA state", "RAWTEXT state", "PLAINTEXT state", "Script data state"],
"input":"\\u0000",
"output":["ParseError", ["Character", "\\uFFFD"]]
"output":[["Character", "\\uFFFD"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 1 }
]
},
{
"description":"NUL in script HTML comment",
"doubleEscaped":true,
"initialStates":["Script data state"],
"input":"<!--test\\u0000--><!--test-\\u0000--><!--test--\\u0000-->",
"output":[["Character", "<!--test\\uFFFD--><!--test-\\uFFFD--><!--test--\\uFFFD-->"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 9 },
{ "code": "unexpected-null-character", "line": 1, "col": 22 },
{ "code": "unexpected-null-character", "line": 1, "col": 36 }
]
},
{
"description":"NUL in script HTML comment - double escaped",
"doubleEscaped":true,
"initialStates":["Script data state"],
"input":"<!--<script>\\u0000--><!--<script>-\\u0000--><!--<script>--\\u0000-->",
"output":[["Character", "<!--<script>\\uFFFD--><!--<script>-\\uFFFD--><!--<script>--\\uFFFD-->"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 13 },
{ "code": "unexpected-null-character", "line": 1, "col": 30 },
{ "code": "unexpected-null-character", "line": 1, "col": 48 }
]
},
{
"description":"EOF in script HTML comment",
"initialStates":["Script data state"],
"input":"<!--test",
"output":[["Character", "<!--test"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 9 }
]
},
{
"description":"EOF in script HTML comment after dash",
"initialStates":["Script data state"],
"input":"<!--test-",
"output":[["Character", "<!--test-"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 10 }
]
},
{
"description":"EOF in script HTML comment after dash dash",
"initialStates":["Script data state"],
"input":"<!--test--",
"output":[["Character", "<!--test--"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 11 }
]
},
{
"description":"EOF in script HTML comment double escaped after dash",
"initialStates":["Script data state"],
"input":"<!--<script>-",
"output":[["Character", "<!--<script>-"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 14 }
]
},
{
"description":"EOF in script HTML comment double escaped after dash dash",
"initialStates":["Script data state"],
"input":"<!--<script>--",
"output":[["Character", "<!--<script>--"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 15 }
]
},
{
"description":"EOF in script HTML comment - double escaped",
"initialStates":["Script data state"],
"input":"<!--<script>",
"output":[["Character", "<!--<script>"]],
"errors":[
{ "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 13 }
]
},
{
"description":"leading U+FEFF must pass through",
Expand All @@ -38,7 +128,10 @@
"description":"Bad charref in in RCDATA",
"initialStates":["RCDATA state"],
"input":"&NotEqualTild;",
"output":["ParseError", ["Character", "&NotEqualTild;"]]
"output":[["Character", "&NotEqualTild;"]],
"errors":[
{ "code": "unknown-named-character-reference", "line": 1, "col": 14 }
]
},
{
"description":"lowercase endtags in RCDATA and RAWTEXT",
Expand Down Expand Up @@ -84,12 +177,35 @@
"description":"--!NUL in comment ",
"doubleEscaped":true,
"input":"<!----!\\u0000-->",
"output":["ParseError", "ParseError", ["Comment", "--!\\uFFFD"]]
"output":[["Comment", "--!\\uFFFD"]],
"errors":[
{ "code": "unexpected-null-character", "line": 1, "col": 8 }
]
},
{
"description":"space EOF after doctype ",
"input":"<!DOCTYPE html ",
"output":["ParseError", ["DOCTYPE", "html", null, null , false]]
"output":[["DOCTYPE", "html", null, null , false]],
"errors":[
{ "code": "eof-in-doctype", "line": 1, "col": 16 }
]
},
{
"description":"CDATA in HTML content",
"input":"<![CDATA[foo]]>",
"output":[["Comment", "[CDATA[foo]]"]],
"errors":[
{ "code": "cdata-in-html-content", "line": 1, "col": 9 }
]
},
{
"description":"CDATA content",
"input":"foo&bar",
"initialStates":["CDATA section state"],
"output":[["Character", "foo&bar"]],
"errors":[
{ "code": "eof-in-cdata", "line": 1, "col": 8 }
]
}

]
Expand Down
Loading