8
8
[ ![ Backers] [ backers-badge ]] [ collective ]
9
9
[ ![ Chat] [ chat-badge ]] [ chat ]
10
10
11
- [ ** hast** ] [ hast ] utility to transform to [ ** nlcst** ] [ nlcst ] .
11
+ [ hast] [ ] utility to transform to [ nlcst] [ ] .
12
12
13
- > ** Note ** : You probably want to use [ ` rehype-retext ` ] [ rehype-retext ] .
13
+ ## Contents
14
14
15
- ## Install
15
+ * [ What is this?] ( #what-is-this )
16
+ * [ When should I use this?] ( #when-should-i-use-this )
17
+ * [ Install] ( #install )
18
+ * [ Use] ( #use )
19
+ * [ API] ( #api )
20
+ * [ ` toNlcst(tree, file, Parser) ` ] ( #tonlcsttree-file-parser )
21
+ * [ Types] ( #types )
22
+ * [ Compatibility] ( #compatibility )
23
+ * [ Security] ( #security )
24
+ * [ Related] ( #related )
25
+ * [ Contribute] ( #contribute )
26
+ * [ License] ( #license )
27
+
28
+ ## What is this?
29
+
30
+ This package is a utility that takes a [ hast] [ ] (HTML) syntax tree as input and
31
+ turns it into [ nlcst] [ ] (natural language).
32
+
33
+ ## When should I use this?
34
+
35
+ This project is useful when you want to deal with ASTs and inspect the natural
36
+ language inside HTML.
37
+ Unfortunately, there is no way yet to apply changes to the nlcst back into
38
+ hast.
16
39
17
- This package is [ ESM only ] ( https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c ) :
18
- Node 12+ is needed to use it and it must be ` import ` ed instead of ` require ` d .
40
+ The mdast utility [ ` mdast-util-to-nlcst ` ] [ mdast-util-to-nlcst ] does the same but
41
+ uses a markdown tree as input .
19
42
20
- [ npm] [ ] :
43
+ The rehype plugin [ ` rehype-retext ` ] [ rehype-retext ] wraps this utility to do the
44
+ same at a higher-level (easier) abstraction.
45
+
46
+ ## Install
47
+
48
+ This package is [ ESM only] [ esm ] .
49
+ In Node.js (version 12.20+, 14.14+, or 16.0+), install with [ npm] [ ] :
21
50
22
51
``` sh
23
52
npm install hast-util-to-nlcst
24
53
```
25
54
55
+ In Deno with [ ` esm.sh ` ] [ esmsh ] :
56
+
57
+ ``` js
58
+ import {toNlcst } from " https://esm.sh/hast-util-to-nlcst@2"
59
+ ```
60
+
61
+ In browsers with [ ` esm.sh ` ] [ esmsh ] :
62
+
63
+ ``` html
64
+ <script type =" module" >
65
+ import {toNlcst } from " https://esm.sh/hast-util-to-nlcst@2?bundle"
66
+ </script >
67
+ ```
68
+
26
69
## Use
27
70
28
- Say we have the following ` example.html ` :
71
+ Say our document ` example.html ` contains :
29
72
30
73
``` html
31
74
<article >
@@ -35,64 +78,58 @@ Say we have the following `example.html`:
35
78
</article >
36
79
```
37
80
38
- …and next to it, ` index .js` :
81
+ …and our module ` example .js` looks as follows :
39
82
40
83
``` js
41
- import {readSync } from ' to-vfile'
84
+ import {read } from ' to-vfile'
42
85
import {inspect } from ' unist-util-inspect'
43
86
import {toNlcst } from ' hast-util-to-nlcst'
44
87
import {ParseEnglish } from ' parse-english'
45
- import rehype from ' rehype'
88
+ import { rehype } from ' rehype'
46
89
47
- const file = readSync (' example.html' )
90
+ const file = await read (' example.html' )
48
91
const tree = rehype ().parse (file)
49
92
50
93
console .log (inspect (toNlcst (tree, file, ParseEnglish)))
51
94
```
52
95
53
- Which, when running, yields:
96
+ …now running ` node example.js ` yields (positional info removed for brevity) :
54
97
55
98
``` txt
56
99
RootNode[2] (1:1-6:1, 0-134)
57
- ├─ ParagraphNode[3] (1:10-3:3, 9-24)
58
- │ ├─ WhiteSpaceNode: "\n " (1:10-2:3, 9-12)
59
- │ ├─ SentenceNode[2] (2:3-2:12, 12-21)
60
- │ │ ├─ WordNode[1] (2:3-2:11, 12-20)
61
- │ │ │ └─ TextNode: "Implicit" (2:3-2:11, 12-20)
62
- │ │ └─ PunctuationNode: "." (2:11-2:12, 20-21)
63
- │ └─ WhiteSpaceNode: "\n " (2:12-3:3, 21-24)
64
- └─ ParagraphNode[1] (3:7-3:43, 28-64)
65
- └─ SentenceNode[4] (3:7-3:43, 28-64)
66
- ├─ WordNode[1] (3:7-3:15, 28-36)
67
- │ └─ TextNode: "Explicit" (3:7-3:15, 28-36)
68
- ├─ PunctuationNode: ":" (3:15-3:16, 36-37)
69
- ├─ WhiteSpaceNode: " " (3:16-3:17, 37-38)
70
- └─ WordNode[4] (3:25-3:43, 46-64)
71
- ├─ TextNode: "foo" (3:25-3:28, 46-49)
72
- ├─ TextNode: "s" (3:37-3:38, 58-59)
73
- ├─ PunctuationNode: "-" (3:38-3:39, 59-60)
74
- └─ TextNode: "ball" (3:39-3:43, 60-64)
100
+ ├─0 ParagraphNode[3] (1:10-3:3, 9-24)
101
+ │ ├─0 WhiteSpaceNode "\n " (1:10-2:3, 9-12)
102
+ │ ├─1 SentenceNode[2] (2:3-2:12, 12-21)
103
+ │ │ ├─0 WordNode[1] (2:3-2:11, 12-20)
104
+ │ │ │ └─0 TextNode "Implicit" (2:3-2:11, 12-20)
105
+ │ │ └─1 PunctuationNode "." (2:11-2:12, 20-21)
106
+ │ └─2 WhiteSpaceNode "\n " (2:12-3:3, 21-24)
107
+ └─1 ParagraphNode[1] (3:7-3:43, 28-64)
108
+ └─0 SentenceNode[4] (3:7-3:43, 28-64)
109
+ ├─0 WordNode[1] (3:7-3:15, 28-36)
110
+ │ └─0 TextNode "Explicit" (3:7-3:15, 28-36)
111
+ ├─1 PunctuationNode ":" (3:15-3:16, 36-37)
112
+ ├─2 WhiteSpaceNode " " (3:16-3:17, 37-38)
113
+ └─3 WordNode[4] (3:25-3:43, 46-64)
114
+ ├─0 TextNode "foo" (3:25-3:28, 46-49)
115
+ ├─1 TextNode "s" (3:37-3:38, 58-59)
116
+ ├─2 PunctuationNode "-" (3:38-3:39, 59-60)
117
+ └─3 TextNode "ball" (3:39-3:43, 60-64)
75
118
```
76
119
77
120
## API
78
121
79
- This package exports the following identifiers: ` toNlcst ` .
122
+ This package exports the identifier ` toNlcst ` .
80
123
There is no default export.
81
124
82
125
### ` toNlcst(tree, file, Parser) `
83
126
84
- Transform the given [ ** hast** ] [ hast ] [ * tree* ] [ tree ] to [ ** nlcst** ] [ nlcst ] .
85
-
86
- ##### Parameters
127
+ [ hast] [ ] utility to transform to [ nlcst] [ ] .
87
128
88
- * ` tree ` ([ ` HastNode ` ] [ hast-node ] )
89
- — [ * Tree* ] [ tree ] with [ positional info] [ positional-information ]
90
- ([ ` HastNode ` ] [ hast-node ] )
91
- * ` file ` ([ ` VFile ` ] [ vfile ] )
92
- — Virtual file
93
- * ` parser ` (` Function ` )
94
- — [ ** nlcst** ] [ nlcst ] parser, such as [ ` parse-english ` ] [ english ] ,
95
- [ ` parse-dutch ` ] [ dutch ] , or [ ` parse-latin ` ] [ latin ]
129
+ > 👉 ** Note** : ` tree ` must have positional info, ` file ` must be a [ vfile] [ ]
130
+ > corresponding to ` tree ` , and ` Parser ` must be a parser such as
131
+ > [ ` parse-english ` ] [ parse-english ] , [ ` parse-dutch ` ] [ parse-dutch ] , or
132
+ > [ ` parse-latin ` ] [ parse-latin ] .
96
133
97
134
##### Returns
98
135
@@ -117,7 +154,7 @@ more info).
117
154
###### Ignored nodes
118
155
119
156
Some elements are ignored and their content will not be present in
120
- [ ** nlcst** ] [ nlcst ] : ` <script> ` , ` <style> ` , ` <svg> ` , ` <math> ` , ` <del> ` .
157
+ ** [ nlcst] [ ] ** : ` <script> ` , ` <style> ` , ` <svg> ` , ` <math> ` , ` <del> ` .
121
158
122
159
To ignore other elements, add a ` data-nlcst ` attribute with a value of ` ignore ` :
123
160
@@ -128,7 +165,8 @@ To ignore other elements, add a `data-nlcst` attribute with a value of `ignore`:
128
165
129
166
###### Source nodes
130
167
131
- ` <code> ` elements are mapped to [ ` Source ` ] [ source ] nodes in [ ** nlcst** ] [ nlcst ] .
168
+ ` <code> ` elements are mapped to [ ` Source ` ] [ nlcst-source ] nodes in
169
+ ** [ nlcst] [ ] ** .
132
170
133
171
To mark other elements as source, add a ` data-nlcst ` attribute with a value
134
172
of ` source ` :
@@ -138,6 +176,18 @@ of `source`:
138
176
<p data-nlcst =" source" >Completely marked.</p >
139
177
```
140
178
179
+ ## Types
180
+
181
+ This package is fully typed with [ TypeScript] [ ] .
182
+ It exports the additional types ` ParserConstructor ` and ` ParserInstance ` .
183
+
184
+ ## Compatibility
185
+
186
+ Projects maintained by the unified collective are compatible with all maintained
187
+ versions of Node.js.
188
+ As of now, that is Node.js 12.20+, 14.14+, and 16.0+.
189
+ Our projects sometimes work with older versions, but this is not guaranteed.
190
+
141
191
## Security
142
192
143
193
` hast-util-to-nlcst ` does not change the original syntax tree so there are no
@@ -147,19 +197,15 @@ openings for [cross-site scripting (XSS)][xss] attacks.
147
197
148
198
* [ ` mdast-util-to-nlcst ` ] ( https://github.com/syntax-tree/mdast-util-to-nlcst )
149
199
— transform mdast to nlcst
150
- * [ ` mdast-util-to-hast ` ] ( https://github.com/syntax-tree/mdast-util-to-hast )
151
- — transform mdast to hast
152
200
* [ ` hast-util-to-mdast ` ] ( https://github.com/syntax-tree/hast-util-to-mdast )
153
201
— transform hast to mdast
154
202
* [ ` hast-util-to-xast ` ] ( https://github.com/syntax-tree/hast-util-to-xast )
155
203
— transform hast to xast
156
- * [ ` hast-util-sanitize ` ] ( https://github.com/syntax-tree/hast-util-sanitize )
157
- — sanitize hast nodes
158
204
159
205
## Contribute
160
206
161
- See [ ` contributing.md ` in ` syntax-tree/.github ` ] [ contributing ] for ways to get
162
- started.
207
+ See [ ` contributing.md ` ] [ contributing ] in [ ` syntax-tree/.github ` ] [ health ] for
208
+ ways to get started.
163
209
See [ ` support.md ` ] [ support ] for ways to get help.
164
210
165
211
This project has a [ code of conduct] [ coc ] .
@@ -200,38 +246,42 @@ abide by its terms.
200
246
201
247
[ npm ] : https://docs.npmjs.com/cli/install
202
248
203
- [ license ] : license
249
+ [ esm ] : https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
204
250
205
- [ author ] : https://wooorm.com
251
+ [ esmsh ] : https://esm.sh
206
252
207
- [ contributing ] : https://github.com/syntax-tree/.github/blob/HEAD/contributing.md
253
+ [ typescript ] : https://www.typescriptlang.org
254
+
255
+ [ license ] : license
208
256
209
- [ support ] : https://github .com/syntax-tree/.github/blob/HEAD/support.md
257
+ [ author ] : https://wooorm .com
210
258
211
- [ coc ] : https://github.com/syntax-tree/.github/blob/HEAD/code-of-conduct.md
259
+ [ health ] : https://github.com/syntax-tree/.github
212
260
213
- [ english ] : https://github.com/wooorm/parse-english
261
+ [ contributing ] : https://github.com/syntax-tree/.github/blob/main/contributing.md
214
262
215
- [ latin ] : https://github.com/wooorm/parse-latin
263
+ [ support ] : https://github.com/syntax-tree/.github/blob/main/support.md
216
264
217
- [ dutch ] : https://github.com/wooorm/parse-dutch
265
+ [ coc ] : https://github.com/syntax-tree/.github/blob/main/code-of-conduct.md
218
266
219
267
[ rehype-retext ] : https://github.com/rehypejs/rehype-retext
220
268
221
- [ tree ] : https://github.com/syntax-tree/unist#tree
222
-
223
- [ positional-information ] : https://github.com/syntax-tree/unist#positional-information
269
+ [ vfile ] : https://github.com/vfile/vfile
224
270
225
271
[ hast ] : https://github.com/syntax-tree/hast
226
272
227
- [ hast-node ] : https://github.com/syntax-tree/hast#nodes
228
-
229
273
[ nlcst ] : https://github.com/syntax-tree/nlcst
230
274
231
275
[ nlcst-node ] : https://github.com/syntax-tree/nlcst#nodes
232
276
233
- [ vfile ] : https://github.com/vfile/vfile
277
+ [ nlcst-source ] : https://github.com/syntax-tree/nlcst#source
234
278
235
- [ source ] : https://github.com/syntax-tree/nlcst#source
279
+ [ mdast-util-to-nlcst ] : https://github.com/syntax-tree/mdast-util-to- nlcst
236
280
237
281
[ xss ] : https://en.wikipedia.org/wiki/Cross-site_scripting
282
+
283
+ [ parse-english ] : https://github.com/wooorm/parse-english
284
+
285
+ [ parse-latin ] : https://github.com/wooorm/parse-latin
286
+
287
+ [ parse-dutch ] : https://github.com/wooorm/parse-dutch
0 commit comments