Skip to content

Test the ambiguous ampersand state #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 23, 2017
Merged

Conversation

zcorpan
Copy link
Contributor

@zcorpan zcorpan commented Jun 5, 2017

Follows whatwg/html#2731


I couldn't figure out how to run the tokenizer tests with html5lib. Just nosetests doesn't seem to run them?

@inikulin
Copy link
Member

inikulin commented Jun 5, 2017

I believe it makes sense to use new error format for this

@zcorpan
Copy link
Contributor Author

zcorpan commented Jun 5, 2017

I took unknown-named-character-reference from the spec and otherwise made it consistent with surrounding tests. I figured we may want to sweep all tests to update to the new format. But happy to make the new tests added here use the new format.

What should the error lines say, exactly?

@zcorpan
Copy link
Contributor Author

zcorpan commented Jun 5, 2017

Sorry, that's what #92 does. So, I can rebase this on 92 I suppose.

@zcorpan
Copy link
Contributor Author

zcorpan commented Jun 5, 2017

Or vice versa; if this looks OK we can land this first and rebase 92 and convert these tests there. WDYT?

@inikulin
Copy link
Member

inikulin commented Jun 5, 2017

@zcorpan Both works for me. Do whatever will take less efforts from you.

@zcorpan
Copy link
Contributor Author

zcorpan commented Jun 7, 2017

OK, I think I prefer landing this one first.

@gsnedders
Copy link
Member

gsnedders commented Jun 7, 2017

@zcorpan You need to update the submodule, make sure you have dependencies installed (pip install -r requirements-test.txt), and run pytest. For just the tokenizer tests, pytest html5lib/tests/testdata/tokenizer or whatever the path is. :)

@@ -2,12 +2,24 @@

{"description": "Undefined named entity in attribute value ending in semicolon and whose name starts with a known entity name.",
"input":"<h a='&noti;'>",
"output": [["StartTag", "h", {"a": "&noti;"}]]},
"output": ["ParseError", ["StartTag", "h", {"a": "&noti;"}]]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't report error in this case due to:

If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an ASCII alphanumeric, then, for historical reasons, switch to the character reference end state.

https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm. That should probably switch to the ambiguous ampersand state. https://html.spec.whatwg.org/#syntax-attribute-value says

Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zcorpan zcorpan force-pushed the zcorpan/ambiguous-ampersand branch from 1e6706a to 1a5c511 Compare June 21, 2017 10:55
@zcorpan
Copy link
Contributor Author

zcorpan commented Jun 21, 2017

Rebased and fixed the &noti; in attribute case to not be a parse error.

(1,1): expected-doctype-but-got-chars
(1,7): unknown-named-character-reference
#new-errors
(1:7) missing-semicolon-after-character-reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be unknown-named-character-reference

(1,1): expected-doctype-but-got-chars
(1,950): unknown-named-character-reference
#new-errors
(1:950) missing-semicolon-after-character-reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@zcorpan
Copy link
Contributor Author

zcorpan commented Jun 21, 2017

Fixed.

@inikulin inikulin merged commit 6ffc666 into master Jun 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants