Skip to content

Assertion failures in Python 2 from the etree treewalker #190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattheww opened this issue May 9, 2015 · 0 comments
Closed

Assertion failures in Python 2 from the etree treewalker #190

mattheww opened this issue May 9, 2015 · 0 comments
Labels

Comments

@mattheww
Copy link

mattheww commented May 9, 2015

Assertion failures in Python 2 from the etree treewalker.

If I create an element directly using cElementTree and try to serialise the result using html5lib, I get assertion failures in Python 2 unless I go to special lengths to make sure cElementTree sees unicode strings everywhere.

    from xml.etree import cElementTree as etree
    import html5lib

    doc = html5lib.parse(
        u"<p>test",
        treebuilder="etree",
        namespaceHTMLElements=False)

    head = doc.find("head")
    link = etree.Element("link")
    head.append(link)

    stream = html5lib.treewalkers.getTreeWalker("etree")(doc)
    serializer = html5lib.serializer.htmlserializer.HTMLSerializer()
    rendered = serializer.render(stream)

The render() call fails with:

AssertionError: <type 'str'>
html5lib/treewalkers/etree.py:61 (getNodeDetails)
failing line:
assert type(node.tag) == text_type, type(node.tag)

Using unicode string literals everywhere isn't enough to avoid trouble because cElementTree sometimes constructs attribute names from keyword arguments, eg:

    doc = html5lib.parse(
        u"<p>test",
        treebuilder="etree",
        namespaceHTMLElements=False)

    head = doc.find("head")
    link = etree.Element(u"link", rel=u"stylesheet")
    head.append(link)

    stream = html5lib.treewalkers.getTreeWalker("etree")(doc)
    serializer = html5lib.serializer.htmlserializer.HTMLSerializer()
    rendered = serializer.render(stream)

The render() call fails with:

AssertionError
html5lib/serializer/htmlserializer.py:165 (encodeStrict)
failing line:
assert(isinstance(string, text_type))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants