Fix tokenizer EOF error positions #144

fb55 · 2022-02-28T11:07:56Z

I am trying to move parse5 to the upstream html5lib-tests repo (away from this fork). As a first PR to come from this effort, this PR corrects some tokenizer errors. The changes are in three categories:

Off-by-one errors for EOF errors. Most EOF errors already point at the column after the last character, with some exceptions. These exceptions were fixed.
Line breaks being ignored by some EOF errors. Similar to (1), these are the exception.
~~unknown-named-character-reference errors were missing entirely and have been added.~~ Reverted.

Ms2ger

Sounds fine. I'll give it a few days for others to comment; ping me if I forget

untitaker

I believe absence of errors is correct because &not (without a semicolon) is a valid character reference. This is also in line with what Firefox does to the HTML document a&noti, which decodes as a¬i.

untitaker · 2022-02-28T18:58:41Z

In fact if you check the spec, &noti.. is the exact example they use to describe that edgecase: https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state:

if the markup contains the string I'm &notit; I tell you in an attribute, no character reference is parsed and string remains intact (and there is no parse error).

fb55 · 2022-03-02T10:51:55Z

Thanks a lot for flagging @untitaker. I've reverted the additions.

untitaker · 2022-03-02T18:58:37Z

error locations are not actually standardized, right? this is just to make the testsuite internally consistent?

fb55 · 2022-03-02T19:01:30Z

error locations are not actually standardized, right? this is just to make the testsuite internally consistent?

That is correct.

untitaker

I am not a maintainer but this lgtm

fb55 · 2022-03-10T21:50:36Z

@Ms2ger It would be great if you could have another look at this (as well as #145 if possible)!

Fix tokenizer error positions, add missing errors

cdd63be

Ms2ger approved these changes Feb 28, 2022

View reviewed changes

fb55 mentioned this pull request Feb 28, 2022

Add tests for form in template, << in comment #145

Merged

untitaker reviewed Feb 28, 2022

View reviewed changes

Revert changes to entities

adda8fe

fb55 changed the title ~~Fix tokenizer error positions, add missing errors~~ Fix tokenizer error positions Mar 2, 2022

fb55 changed the title ~~Fix tokenizer error positions~~ Fix tokenizer EOF error positions Mar 2, 2022

fb55 mentioned this pull request Mar 2, 2022

fix(tokenizer): No parse error on attribute quirk inikulin/parse5#430

Merged

untitaker approved these changes Mar 2, 2022

View reviewed changes

Ms2ger merged commit 457a78a into html5lib:master Mar 11, 2022

fb55 deleted the tokenizer-errors branch March 11, 2022 09:38

fb55 mentioned this pull request Mar 11, 2022

test: Add upstream html5lib-tests inikulin/parse5#447

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix tokenizer EOF error positions #144

Fix tokenizer EOF error positions #144

Uh oh!

fb55 commented Feb 28, 2022 •

edited

Loading

Uh oh!

Ms2ger left a comment

Uh oh!

untitaker left a comment

Uh oh!

untitaker commented Feb 28, 2022 •

edited

Loading

Uh oh!

fb55 commented Mar 2, 2022

Uh oh!

untitaker commented Mar 2, 2022

Uh oh!

fb55 commented Mar 2, 2022

Uh oh!

untitaker left a comment

Uh oh!

fb55 commented Mar 10, 2022

Uh oh!

Uh oh!

Fix tokenizer EOF error positions #144

Fix tokenizer EOF error positions #144

Uh oh!

Conversation

fb55 commented Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ms2ger left a comment

Choose a reason for hiding this comment

Uh oh!

untitaker left a comment

Choose a reason for hiding this comment

Uh oh!

untitaker commented Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fb55 commented Mar 2, 2022

Uh oh!

untitaker commented Mar 2, 2022

Uh oh!

fb55 commented Mar 2, 2022

Uh oh!

untitaker left a comment

Choose a reason for hiding this comment

Uh oh!

fb55 commented Mar 10, 2022

Uh oh!

Uh oh!

fb55 commented Feb 28, 2022 •

edited

Loading

untitaker commented Feb 28, 2022 •

edited

Loading