Skip to content

Commit 0101816

Browse files
committed
Add new helper functions and add logic for script type tags in HTML ext
- Add `is_raw`, `is_block`, and `html_escape` helper functions - HTML extension now has the concept of `html` tags which will store the content as non HTML escaped content. `script` and `style` tags will be auto detected for this mode, but this mode can be manually set now via the `markdown` option as well. - In the HTML extension, to protect raw and raw HTML content from the Treprocessors, content will be placed in the stash on `on_end`.
1 parent 30de6a2 commit 0101816

File tree

9 files changed

+162
-43
lines changed

9 files changed

+162
-43
lines changed

docs/src/dictionary/en-custom.txt

+2
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ Tasklist
114114
Toc
115115
Tox
116116
Treeprocessor
117+
Treeprocessors
117118
Twemoji
118119
Twemoji's
119120
Twitter's
@@ -156,6 +157,7 @@ emoji
156157
emojione
157158
escaper
158159
eslint
160+
etree
159161
facelessuser
160162
formatter
161163
formatter's

docs/src/markdown/.snippets/links.md

+2
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
[emojione-index]: https://github.com/facelessuser/pymdown-extensions/blob/master/pymdownx/emoji1_db.py
4444
[emojione-sprites-svg]: https://github.com/Ranks/emojione/blob/v2.2.7/assets/sprites/emojione.sprites.svg
4545
[emojione]: https://github.com/Ranks/emojione
46+
[etree]: https://docs.python.org/3/library/xml.etree.elementtree.html
4647
[flake8-docstrings]: https://pypi.python.org/pypi/flake8-docstrings
4748
[flake8]: https://pypi.python.org/pypi/flake8
4849
[footnotes]: https://python-markdown.github.io/extensions/footnotes/
@@ -73,6 +74,7 @@
7374
[python-markdown]: https://github.com/Python-Markdown/markdown
7475
[pyyaml]: https://github.com/yaml/pyyaml
7576
[requests]: https://pypi.python.org/pypi/requests/
77+
[stash]: https://python-markdown.github.io/extensions/api/#working_with_raw_html
7678
[tables]: https://python-markdown.github.io/extensions/tables/
7779
[toc]: https://python-markdown.github.io/extensions/toc/
7880
[tox]: https://pypi.python.org/pypi/tox

docs/src/markdown/extensions/blocks/api.md

+42-5
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,37 @@ developers via `self.options['attrs']`. The result is a dictionary of key/value
176176
the value is a `#!py3 str` (or `#!py3 list[str]` in the special case of `class`).
177177
///
178178

179+
## `is_raw`
180+
181+
```py
182+
def is_raw(self, tag: Element) -> bool:
183+
...
184+
```
185+
186+
This method, given a tag will determine if the block should be considered a "raw" tag based on the Blocks extension's
187+
internal logic.
188+
189+
## `is_block`
190+
191+
```py
192+
def is_block(self, tag: Element) -> bool:
193+
...
194+
```
195+
196+
This method, given a tag will determine if the block should be considered a "block" tag based on the Blocks extension's
197+
internal logic.
198+
199+
200+
## `html_escape`
201+
202+
```py
203+
def html_escape(self, text: str) -> str:
204+
...
205+
```
206+
207+
Takes a string intended for an HTML tag's content and returns it after applying HTML escaping on it. Escapes `&`, `<`,
208+
and `>`.
209+
179210
## `on_init` Event
180211

181212
```py
@@ -242,7 +273,8 @@ def on_markdown(self) -> str:
242273
```
243274

244275
The `on_markdown` event is used to declare how the content of the block should be handled by the Markdown parser. A
245-
string with one of the following values _must_ be returned.
276+
string with one of the following values _must_ be returned. All content is stored under the [etree][etree] element
277+
returned via the [`on_add` event](#on_add-event), regardless of what mode is returned.
246278

247279
Result\ Value | Description
248280
------------- | -----------
@@ -255,10 +287,15 @@ Only during the [`on_end` event](#on_end-event) will all the content be fully ac
255287
block processors, and only during the [`on_inline_end` event](#on_inline_end-event) will both block and inline
256288
processing be completed.
257289

258-
When using `raw` mode, all text will be gathered as blocks are processed and will be fully available during the
259-
[`on_end` event](#on_end-event). Content in a `raw` block should be indented to avoid the HTML parser and will be
260-
dedented (no more than the current Markdown tab length) in the final result. Content will stored as a Python Markdown
261-
[`AtomicString`][atomic].
290+
When using `raw` mode, all text will be accumulated and fully available during the [`on_end` event](#on_end-event).
291+
Content is accessible under the element returned by the [`on_add` event](#on_add-event) and can be accessed via
292+
`element.text`. Text content is stored as a Python Markdown [`AtomicString`][atomic]. If desired, the content can be
293+
stored in the [HTML stash][stash] during the [`on_add` event](#on_add-event)to ensure it makes it through any and all
294+
Treeprocessors after inline handling. Additionally, when storing in the stash, the developer can HTML escape the content
295+
to have the text present as literal or store without HTML escaping to present as an altered HTML content.
296+
297+
A `raw` block expects the content to be an indented code block as this is necessary to avoid some Python Markdown's
298+
internal HTML parser. Content will not have the extra indentation in the final output.
262299

263300
It should be noted that `raw` mode cannot prevent transformations that are applied during Python Markdown's preprocessor
264301
steps. Blocks will attempt to revert any placeholders within the content that are currently found in the HTML stash.

docs/src/markdown/extensions/blocks/index.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,10 @@ Some content.
105105
////
106106

107107
Some blocks may take raw content (and should note this in their documentation) which will avoid further Markdown
108-
processing on the content. Due to the way Python Markdown works, these content blocks must be indented to avoid having
109-
the HTML processor from altering content. Raw content blocks will remove the indentation up to the Markdown tab length
110-
(4 spaces by default). If they are not indented, they will still be processed, but they may be affected by Python
111-
Markdown's HTML processor.
108+
processing on the content. This is done by requiring the content to be an indented code block. Due to the way Python
109+
Markdown works, these content blocks must be indented to avoid having the HTML processor from altering content. Raw
110+
blocks cannot shield content from all preprocessor transformations, but by requiring the content to be indented code
111+
blocks, the content will survive any alterations that a traditional code block would survive.
112112

113113
```
114114
/// html | pre

docs/src/markdown/extensions/blocks/plugins/html.md

+26-16
Original file line numberDiff line numberDiff line change
@@ -56,14 +56,35 @@ some *markdown* content
5656

5757
By default HTML blocks will automatically have the content rendering determined from tag name, so `div` blocks will be
5858
treated as block elements, `span` will be treated as inline elements, and things like `pre` will treat the content as
59-
raw text that should not be processed by Markdown further. With that said, there may be cases where an HTML element
60-
isn't properly recognized yet, or the user simply wants to control how the element processes its content, in these
61-
cases, the `markdown` option can be used to specify how Markdown content is handled.
59+
raw text that needs HTML escaping, and things like `script` will be treated as raw content does not need HTML escaping.
60+
With that said, there may be cases where an HTML element isn't properly recognized yet, or the user simply wants to
61+
control how the element processes its content, in these cases, the `markdown` option can be used to specify how Markdown
62+
content is handled.
63+
64+
Markdown\ Modes | Description
65+
--------------- | -----------
66+
`block` | Parsed block content will be handled by the Markdown parser as content under a block element.
67+
`inline` | Parsed block content will be handled by the Markdown parser as content under an inline element.
68+
`raw` | Parsed block content will be preserved. No additional Markdown parsing will be applied. Content will be HTML escaped to preserve the content as is.
69+
`auto` | Depending on whether the wrapping parent is a block element, inline element, or something like a code element, Blocks will choose the best approach for the content. Decision is made based on the element returned by the [`on_add` event](#on_add-event).
70+
`html` | Like `raw`, content will be preserved, but the content will _not_ be HTML escaped and will be passed through as unmodified HTML. Any required sanitizing should be provided by the user post Markdown processing.
71+
72+
/// tip | Raw and HTML Mode
73+
When using _raw_ tags or forcing _raw_ mode with `markdown: raw` (HTML escaped) or `markdown: html` (no HTML escaping),
74+
code must be indented. This is because Python Markdown will look for and process raw HTML in non indented blocks. The
75+
only avoid this is to use indented code blocks. If content is not indented, the content may be missing at the end.
76+
77+
Recognized raw block tags: `canvas`, `math`, `option`, `pre`, and `textarea`.
78+
79+
Recognized raw HTML tags: `script` and `style`.
80+
81+
Also, make sure to have a new line before indented content so it is not recognized as an attempt to specify YAML
82+
options.
83+
///
6284

6385
In the following example we force `pre` to handle content as Markdown block content instead of the usual raw content
6486
default.
6587

66-
6788
```text title="Pre as Block"
6889
/// html | pre
6990
@@ -90,20 +111,9 @@ some *markdown* content
90111
////
91112
///
92113

93-
/// tip | Raw Mode
94-
When using _raw_ tags or forcing _raw_ mode with `markdown: raw`, it is advised to indent the code. This is because
95-
Python Markdown will look for and process raw HTML in non indented blocks. The only avoid this is to use indented
96-
blocks. Content will automatically be dedented by the expected tab length.
97-
98-
Recognized raw block tags: `canvas`, `math`, `option`, `pre`, `script`, `style`, and `textarea`.
99-
100-
Also, make sure to have a new line before indented content so it is not recognized as an attempt to specify YAML
101-
options.
102-
///
103-
104114
## Per Block Options
105115

106116
Options | Type | Descriptions
107117
------------ | ---------- | ------------
108-
`markdown` | string | String value to control how Markdown content is processed. Valid options are: `auto`, `block`, `inline`, and `raw`.
118+
`markdown` | string | String value to control how Markdown content is processed. Valid options are: `auto`, `block`, `inline`, `html`, and `raw`.
109119
`attrs` | string | A string that defines attributes for the outer, wrapper element.

pymdownx/blocks/__init__.py

+7-7
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ def __init__(self, parser, md):
167167
['address', 'dd', 'dt', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'legend', 'li', 'p', 'summary', 'td', 'th']
168168
)
169169
# Block-level tags which never get their content parsed.
170-
self.raw_tags = set(['canvas', 'math', 'option', 'pre', 'script', 'style', 'textarea'])
170+
self.raw_tags = set(['canvas', 'math', 'option', 'pre', 'script', 'style', 'textarea', 'code'])
171171
# Block-level tags in which the content gets parsed as blocks
172172
self.block_tags = set(self.block_level_tags) - (self.span_tags | self.raw_tags | self.empty_tags)
173173
self.span_and_blocks_tags = self.block_tags | self.span_tags
@@ -339,15 +339,15 @@ def get_parent(self, parent):
339339
temp = self.lastChild(temp)
340340
return None
341341

342-
def is_raw(self, tag, mode):
342+
def is_raw(self, tag):
343343
"""Is tag raw."""
344344

345-
return mode == 'raw' or (mode == 'auto' and tag.tag in self.raw_tags)
345+
return tag.tag in self.raw_tags
346346

347-
def is_block(self, tag, mode):
347+
def is_block(self, tag):
348348
"""Is tag block."""
349349

350-
return mode == 'block' or (mode == 'auto' and tag.tag in self.block_tags)
350+
return tag.tag in self.block_tags
351351

352352
def parse_blocks(self, blocks, entry):
353353
"""Parse the blocks."""
@@ -364,8 +364,8 @@ def parse_blocks(self, blocks, entry):
364364
mode = entry.block.on_markdown()
365365
if mode not in ('block', 'inline', 'raw'):
366366
mode = 'auto'
367-
is_block = self.is_block(target, mode)
368-
is_atomic = self.is_raw(target, mode)
367+
is_block = mode == 'block' or (mode == 'auto' and self.is_block(target))
368+
is_atomic = mode == 'raw' or (mode == 'auto' and self.is_raw(target))
369369

370370
# We should revert fenced code in spans or atomic tags.
371371
# Make sure atomic tags have content wrapped as `AtomicString`.

pymdownx/blocks/block.py

+13-5
Original file line numberDiff line numberDiff line change
@@ -240,15 +240,23 @@ def __init__(self, length, tracker, block_mgr, config):
240240
self.config = config
241241
self.on_init()
242242

243-
def is_raw(self, tag, mode):
243+
def is_raw(self, tag):
244244
"""Is raw element."""
245245

246-
return self._block_mgr.is_raw(tag, mode)
246+
return self._block_mgr.is_raw(tag)
247247

248-
def is_block(self, tag, mode): # pragma: no cover
248+
def is_block(self, tag): # pragma: no cover
249249
"""Is block element."""
250250

251-
return self._block_mgr.is_block(tag, mode)
251+
return self._block_mgr.is_block(tag)
252+
253+
def html_escape(self, text):
254+
"""Basic html escaping."""
255+
256+
text = text.replace('&', '&amp;')
257+
text = text.replace('<', '&lt;')
258+
text = text.replace('>', '&gt;')
259+
return text
252260

253261
def dedent(self, text, length=None):
254262
"""Dedent raw text."""
@@ -349,7 +357,7 @@ def _end(self, block):
349357

350358
mode = self.on_markdown()
351359
add = self.on_add(block)
352-
if self.is_raw(add, mode):
360+
if mode == 'raw' or (mode == 'auto' and self.is_raw(add)):
353361
add.text = mutil.AtomicString(self.dedent(add.text))
354362

355363
self.on_end(block)

pymdownx/blocks/html.py

+19-2
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ class HTML(Block):
125125
NAME = 'html'
126126
ARGUMENT = True
127127
OPTIONS = {
128-
'markdown': ['auto', type_string_in(['auto', 'inline', 'block', 'raw'])]
128+
'markdown': ['auto', type_string_in(['auto', 'inline', 'block', 'raw', 'html'])]
129129
}
130130

131131
def __init__(self, length, tracker, md, config):
@@ -147,14 +147,31 @@ def on_validate(self, parent):
147147
def on_markdown(self):
148148
"""Check if this is atomic."""
149149

150-
return self.options['markdown']
150+
mode = self.options['markdown']
151+
if mode == 'html':
152+
mode = 'raw'
153+
return mode
151154

152155
def on_create(self, parent):
153156
"""Create the element."""
154157

155158
# Create element
156159
return etree.SubElement(parent, self.tag.lower(), self.attr)
157160

161+
def is_html(self, tag):
162+
"""Does tag require no processing and no HTML escaping."""
163+
164+
return tag.tag in ('script', 'style')
165+
166+
def on_end(self, block):
167+
"""On end event."""
168+
169+
mode = self.options['markdown']
170+
if (mode == 'auto' and self.is_html(block)) or mode == 'html':
171+
block.text = self.md.htmlStash.store(block.text)
172+
elif (mode == 'auto' and self.is_raw(block)) or mode == 'raw':
173+
block.text = self.md.htmlStash.store(self.html_escape(block.text))
174+
158175

159176
class HTMLExtension(BlocksExtension):
160177
"""HTML Blocks Extension."""

tests/test_extensions/test_blocks/test_html.py

+47-4
Original file line numberDiff line numberDiff line change
@@ -231,8 +231,8 @@ def test_multi_class2(self):
231231
True
232232
)
233233

234-
def test_inline_and_non_empty_parent(self):
235-
"""Test inline format that utilizes."""
234+
def test_inline_and_md_in_html(self):
235+
"""Test inline format and HTML content."""
236236

237237
self.check_markdown(
238238
R'''
@@ -258,8 +258,8 @@ def test_inline_and_non_empty_parent(self):
258258
True
259259
)
260260

261-
def test_raw_and_non_empty_parent(self):
262-
"""Test inline format that utilizes."""
261+
def test_raw_and_md_in_html(self):
262+
"""Test raw format and HTML content."""
263263

264264
self.check_markdown(
265265
R'''
@@ -282,3 +282,46 @@ def test_raw_and_non_empty_parent(self):
282282
''',
283283
True
284284
)
285+
286+
def test_html_and_html(self):
287+
"""Test HTML mode format with HTML code."""
288+
289+
self.check_markdown(
290+
R'''
291+
/// html | div
292+
markdown: html
293+
294+
<div>
295+
**content**
296+
</div>
297+
298+
this is <span>raw</span> **content**
299+
///
300+
''',
301+
'''
302+
<div><div>
303+
**content**
304+
</div>
305+
306+
this is <span>raw</span> **content**</div>
307+
''',
308+
True
309+
)
310+
311+
def test_html_and_script(self):
312+
"""Test inline format that script."""
313+
314+
self.check_markdown(
315+
R'''
316+
/// html | script
317+
318+
const el = document.querySelector('div');
319+
el.innerHTML = '<span>test</span>
320+
///
321+
''',
322+
'''
323+
<script>const el = document.querySelector('div');
324+
el.innerHTML = '<span>test</span></script>
325+
''',
326+
True
327+
)

0 commit comments

Comments
 (0)