-
Notifications
You must be signed in to change notification settings - Fork 63
Test the ambiguous ampersand state #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I believe it makes sense to use new error format for this |
I took What should the error lines say, exactly? |
Sorry, that's what #92 does. So, I can rebase this on 92 I suppose. |
Or vice versa; if this looks OK we can land this first and rebase 92 and convert these tests there. WDYT? |
@zcorpan Both works for me. Do whatever will take less efforts from you. |
OK, I think I prefer landing this one first. |
@zcorpan You need to update the submodule, make sure you have dependencies installed ( |
tokenizer/entities.test
Outdated
@@ -2,12 +2,24 @@ | |||
|
|||
{"description": "Undefined named entity in attribute value ending in semicolon and whose name starts with a known entity name.", | |||
"input":"<h a='¬i;'>", | |||
"output": [["StartTag", "h", {"a": "¬i;"}]]}, | |||
"output": ["ParseError", ["StartTag", "h", {"a": "¬i;"}]]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't report error in this case due to:
If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an ASCII alphanumeric, then, for historical reasons, switch to the character reference end state.
https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm. That should probably switch to the ambiguous ampersand state. https://html.spec.whatwg.org/#syntax-attribute-value says
Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1e6706a
to
1a5c511
Compare
Rebased and fixed the |
tree-construction/entities01.dat
Outdated
(1,1): expected-doctype-but-got-chars | ||
(1,7): unknown-named-character-reference | ||
#new-errors | ||
(1:7) missing-semicolon-after-character-reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be unknown-named-character-reference
tree-construction/entities01.dat
Outdated
(1,1): expected-doctype-but-got-chars | ||
(1,950): unknown-named-character-reference | ||
#new-errors | ||
(1:950) missing-semicolon-after-character-reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Fixed. |
Follows whatwg/html#2731
I couldn't figure out how to run the tokenizer tests with html5lib. Just
nosetests
doesn't seem to run them?