Skip to content

Relative URLs not allowed by sanitizer after 0.999 #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yakky opened this issue May 2, 2015 · 2 comments
Closed

Relative URLs not allowed by sanitizer after 0.999 #189

yakky opened this issue May 2, 2015 · 2 comments

Comments

@yakky
Copy link

yakky commented May 2, 2015

This snippet works in 0.999 but not in 0.9999

from html5lib import HTMLParser, sanitizer, serializer, treebuilders, treewalkers

opts = {}
opts['tokenizer'] = sanitizer.HTMLSanitizer
parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("dom"), **opts)

data = u'Hello World<img src="/static/cms/img/icons/plugins/link.png" alt="Link - A Link" title="Link - A Link" id="plugin_obj_2" />'
dom_tree = parser.parseFragment(data)
walker = treewalkers.getTreeWalker("dom")
stream = walker(dom_tree)
s = serializer.htmlserializer.HTMLSerializer()
text = ''.join(s.serialize(stream))

assert text == data

Issue is not present when not using the sanitiser

@gsnedders gsnedders changed the title HTMLSanitiser broken after upgrade to 0.9999 Relative URLs not allowed by sanitizer after 0.999 May 2, 2015
@stefanfoulis
Copy link

in case html5lib causes more problems, we've had good experiences with https://pypi.python.org/pypi/bleach/ .

@gsnedders
Copy link
Member

Bleach just uses html5lib, FWIW. This should get fixed soonish (and a release should follow), but I'm in the middle of doing stuff like moving flat, so it's all a bit hectic here. It'll just be caused by the data URI changes, if anyone wants to write a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants