Implement tokenization errors as per spec. #92

inikulin · 2017-05-21T21:04:13Z

Ready for a review, but it makes sense to merge spec changes first. Note that for now we've decided to move new error codes to a separate section in tree construction stage tests to not mix things up. Once we have a spec for tree construction stage errors, we'll remove old errors and move new errors to #errors section.

Spec PR: whatwg/html#2701

…NTEXT and Script data states.

Adding tests for Tag Name parse errors

Comments parse errors

More comment parse errors

Review fixes2

Add error for duplicate attribute

Character error fixes

inikulin · 2017-05-31T23:00:39Z

Spec PR has been merged. This one is ready for review.

inikulin · 2017-06-06T10:02:03Z

Ping @gsnedders

gsnedders

I haven't actually checked they match the spec, because that would take forever, and I'd rather just wait until some implementation notices they don't match.

Apart from the one issue, what's going on with the tree construction tests? Some of the #errors sections have been partially updated for the tokenizer errors, in addition to the #new-errors sections? (Personally, I have a slight preference against adding a new section temporarily, given I know some things rely on the specific headings.)

gsnedders · 2017-06-06T16:51:36Z

tokenizer/README.md

@@ -65,7 +65,6 @@ tokens are:
    ["EndTag", name]
    ["Comment", data]
    ["Character", data]
-    "ParseError"


Should document the new errors property.

inikulin · 2017-06-06T19:30:36Z

@gsnedders

Apart from the one issue, what's going on with the tree construction tests? Some of the #errors sections have been partially updated for the tokenizer errors, in addition to the #new-errors sections?

Unfortunately, seems like some error codes were erroneously added to the legacy error section and we missed it on review. I'll do a cleanup.

Update: I've figured out a cause of this issue: some error codes that didn't get into the final cut were the same as legacy error codes. During automatic rename they were replaced with new codes.

Personally, I have a slight preference against adding a new section temporarily, given I know some things rely on the specific headings

I thought it would be an unobtrusive way for a transition to the new parse errors. The other options are:

Remove all old parse errors in the tree construction tests and add new one directly to the #errors section. But there can be implementations that use old parse errors.
Move all new error in the #errors section, but use separator between old and new parse errors (e.g., (0,0) -------). Will not work due to the same reasons as the previous option.

Do you have any other ideas?

gsnedders · 2017-06-07T17:49:53Z

@inikulin

Do you have any other ideas?

Not really. :) Adding a new section probably makes as much sense as anything else.

Review fixes2

inikulin · 2017-06-08T17:24:42Z

@gsnedders Fixed

It's not used anymore with changes in html5lib#92.

- Use new initial states in tests according to: html5lib/html5lib-tests#101 - Implement tokenization errors introduced in: whatwg/html#2701 html5lib/html5lib-tests#92

stevecheckoway · 2018-10-03T08:20:36Z

I haven't actually checked they match the spec, because that would take forever, and I'd rather just wait until some implementation notices they don't match.

For what it's worth, the code I'm testing, Nokogumbo, outputs each #new-error error message in exactly the same order as the tests.

(It also tests that the line numbers match and the column numbers in the test are at least as large as the column numbers Nokogumbo outputs. I can't test for column number equality because Nokogumbo outputs column numbers at a place that would be most useful for humans, not the column where the error was detected.)

inikulin and others added 30 commits March 25, 2017 19:23

Add control-or-undefined-character-in-input-stream parse error.

222b75c

Add non-unicode-character-in-input-stream parse error.

3f10d5b

Add self-closing-non-void-html-element error.

5a3b850

Add end-tag-with-attributes error.

203f0ef

Add self-closing-end-tag error.

7b56b75

Add unexpected-null-character error.

91f17a6

Add unexpected-null-character parse error in in RCDATA, RAWTEXT, PLAI…

7989d84

…NTEXT and Script data states.

Add Tag open state errors

6e4821a

Add End tag open state parse errors.

c8d5a02

Add Markup declaration open state parse errors.

a1938c2

Add Script data escaped state parse errors.

2cada34

Add Script data escaped dash state parse errors.

23176e3

Add Script data escaped dash dash state errors.

8c72094

Add Script data double escaped state errors.

586f2dd

Adding tests for Tag Name parse errors

5fc480e

Adding back old eof error

924515f

Merge pull request #1 from diervo/dval/parseErrorTagName

f06aecd

Adding tests for Tag Name parse errors

Adding parser errors for before attribute name state

dcd7002

Add Comment less-than sign bang dash dash state errors.

c9436ca

Add Comment start state errors.

2918163

Add Comment start dash state errors.

ad3c75e

Add Comment state errors.

a74e16a

Adding parser errors for attibute name state

e7d3d7d

Merge pull request #2 from HTMLParseErrorWG/comments-parse-errors

6e62741

Comments parse errors

Add Comment end dash state errors.

730d1fd

Add Comment end state errors.

95be487

Add Comment end bang state errors.

1bb47c3

Adding parser errors for after attibute name state

ba7425f

Merge pull request #3 from HTMLParseErrorWG/comment-parse-errors2

f7d0432

More comment parse errors

Generalizing error naming

8179234

inikulin added 11 commits May 30, 2017 14:00

Rename abrupt closing of comment error

5765c84

non-void-element -> non-void-html-element

eae4e2d

Merge pull request #22 from HTMLParseErrorWG/review-fixes2

fa43d3d

Review fixes2

Add error for duplicate attribute

abf44b5

Merge pull request #23 from HTMLParseErrorWG/review-fixes2

f7c525f

Add error for duplicate attribute

non-unicode-character-in-input-stream -> surrogate-in-input-stream

eaeee69

undefined-character-in-input-stream -> noncharacter-in-input-stream

7b7a220

undefined-character-reference -> noncharacter-character-reference

0ceaf59

non-unicode-character-reference -> surrogate-character-reference

7ed9c4f

character-reference-outside-unicode-range

8c43e0e

Merge pull request #24 from HTMLParseErrorWG/review-fixes2

b13e570

Character error fixes

inikulin changed the title ~~[DO NOT MERGE YET] Implement tokenization errors as per spec.~~ Implement tokenization errors as per spec. May 31, 2017

zcorpan mentioned this pull request Jun 5, 2017

Test the ambiguous ampersand state #94

Merged

gsnedders reviewed Jun 6, 2017

View reviewed changes

inikulin added 3 commits June 8, 2017 19:53

Fix erroneously changed legacy errors

8f5f958

Remove ignoreErrorOrder property. Add error format description

7b6415d

Merge pull request #25 from HTMLParseErrorWG/review-fixes2

9ec9f26

Review fixes2

gsnedders merged commit 71bd617 into html5lib:master Jun 12, 2017

syjer mentioned this pull request Jul 3, 2017

Question about the logic of merged "Character" tokens after PR #92 in tokenizer tests #96

Closed

RReverser added a commit to RReverser/html5lib-tests that referenced this pull request Jul 12, 2017

Remove ignoreErrorOrder option from docs

a5c88a4

It's not used anymore with changes in html5lib#92.

RReverser mentioned this pull request Jul 12, 2017

Remove ignoreErrorOrder option from docs #97

Merged

RReverser mentioned this pull request Jul 21, 2017

[Tracking issue] Document initialStates #99

Closed

gsnedders mentioned this pull request Oct 1, 2018

New and old parse errors #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement tokenization errors as per spec. #92

Implement tokenization errors as per spec. #92

inikulin commented May 21, 2017 •

edited

Loading

inikulin commented May 31, 2017

inikulin commented Jun 6, 2017

gsnedders left a comment

gsnedders Jun 6, 2017

inikulin commented Jun 6, 2017 •

edited

Loading

gsnedders commented Jun 7, 2017

inikulin commented Jun 8, 2017

stevecheckoway commented Oct 3, 2018

Implement tokenization errors as per spec. #92

Implement tokenization errors as per spec. #92

Conversation

inikulin commented May 21, 2017 • edited Loading

inikulin commented May 31, 2017

inikulin commented Jun 6, 2017

gsnedders left a comment

Choose a reason for hiding this comment

gsnedders Jun 6, 2017

Choose a reason for hiding this comment

inikulin commented Jun 6, 2017 • edited Loading

gsnedders commented Jun 7, 2017

inikulin commented Jun 8, 2017

stevecheckoway commented Oct 3, 2018

inikulin commented May 21, 2017 •

edited

Loading

inikulin commented Jun 6, 2017 •

edited

Loading