Skip to content

Implement tokenization errors as per spec. #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 106 commits into from
Jun 12, 2017

Conversation

inikulin
Copy link
Member

@inikulin inikulin commented May 21, 2017

Ready for a review, but it makes sense to merge spec changes first. Note that for now we've decided to move new error codes to a separate section in tree construction stage tests to not mix things up. Once we have a spec for tree construction stage errors, we'll remove old errors and move new errors to #errors section.

Spec PR: whatwg/html#2701

inikulin and others added 30 commits March 25, 2017 19:23
Adding tests for Tag Name parse errors
@inikulin inikulin changed the title [DO NOT MERGE YET] Implement tokenization errors as per spec. Implement tokenization errors as per spec. May 31, 2017
@inikulin
Copy link
Member Author

Spec PR has been merged. This one is ready for review.

@inikulin
Copy link
Member Author

inikulin commented Jun 6, 2017

Ping @gsnedders

Copy link
Member

@gsnedders gsnedders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually checked they match the spec, because that would take forever, and I'd rather just wait until some implementation notices they don't match.

Apart from the one issue, what's going on with the tree construction tests? Some of the #errors sections have been partially updated for the tokenizer errors, in addition to the #new-errors sections? (Personally, I have a slight preference against adding a new section temporarily, given I know some things rely on the specific headings.)

@@ -65,7 +65,6 @@ tokens are:
["EndTag", name]
["Comment", data]
["Character", data]
"ParseError"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document the new errors property.

@inikulin
Copy link
Member Author

inikulin commented Jun 6, 2017

@gsnedders

Apart from the one issue, what's going on with the tree construction tests? Some of the #errors sections have been partially updated for the tokenizer errors, in addition to the #new-errors sections?

Unfortunately, seems like some error codes were erroneously added to the legacy error section and we missed it on review. I'll do a cleanup.

Update: I've figured out a cause of this issue: some error codes that didn't get into the final cut were the same as legacy error codes. During automatic rename they were replaced with new codes.

Personally, I have a slight preference against adding a new section temporarily, given I know some things rely on the specific headings

I thought it would be an unobtrusive way for a transition to the new parse errors. The other options are:

  • Remove all old parse errors in the tree construction tests and add new one directly to the #errors section. But there can be implementations that use old parse errors.
  • Move all new error in the #errors section, but use separator between old and new parse errors (e.g., (0,0) -------). Will not work due to the same reasons as the previous option.

Do you have any other ideas?

@gsnedders
Copy link
Member

@inikulin

Do you have any other ideas?

Not really. :) Adding a new section probably makes as much sense as anything else.

@inikulin
Copy link
Member Author

inikulin commented Jun 8, 2017

@gsnedders Fixed

@gsnedders gsnedders merged commit 71bd617 into html5lib:master Jun 12, 2017
RReverser added a commit to RReverser/html5lib-tests that referenced this pull request Jul 12, 2017
It's not used anymore with changes in html5lib#92.
iabudiab added a commit to iabudiab/HTMLKit that referenced this pull request Sep 10, 2017
- Use new initial states in tests according to:
html5lib/html5lib-tests#101

- Implement tokenization errors introduced in:
whatwg/html#2701
html5lib/html5lib-tests#92
@stevecheckoway
Copy link
Contributor

I haven't actually checked they match the spec, because that would take forever, and I'd rather just wait until some implementation notices they don't match.

For what it's worth, the code I'm testing, Nokogumbo, outputs each #new-error error message in exactly the same order as the tests.

(It also tests that the line numbers match and the column numbers in the test are at least as large as the column numbers Nokogumbo outputs. I can't test for column number equality because Nokogumbo outputs column numbers at a place that would be most useful for humans, not the column where the error was detected.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants