Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

Commit 0c0c749

Browse files
committed
Edited 220_Token_normalization/30_Unicode_world.asciidoc with Atlas code editor
1 parent b4af498 commit 0c0c749

File tree

1 file changed

+6
-7
lines changed

1 file changed

+6
-7
lines changed

220_Token_normalization/30_Unicode_world.asciidoc

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -71,21 +71,20 @@ PUT /my_index
7171
<1> Normalize all tokens into the `nfkc` normalization form.
7272

7373
[TIP]
74-
.When to normalize
7574
==================================================
7675
77-
Besides the `icu_normalizer` token filter mentioned above, there is also an
78-
`icu_normalizer` *character* filter, which((("icu_normalizer character filter"))) does the same job as the token
79-
filter, but it does it before the text reaches the tokenizer. When using the
76+
Besides the `icu_normalizer` token filter mentioned previously, there is also an
77+
`icu_normalizer` _character_ filter, which((("icu_normalizer character filter"))) does the same job as the token
78+
filter, but does so before the text reaches the tokenizer. When using the
8079
`standard` tokenizer or `icu_tokenizer`, this doesn't really matter. These
8180
tokenizers know how to deal with all forms of Unicode correctly.
8281
8382
However, if you plan on using a different tokenizer, such as the `ngram`,
84-
`edge_ngram` or `pattern` tokenizers, then it woud make sense to use the
83+
`edge_ngram`, or `pattern` tokenizers, it would make sense to use the
8584
`icu_normalizer` character filter in preference to the token filter.
8685
8786
==================================================
8887

89-
Usually, though, not only will you want to normalize the byte order of tokens,
90-
but also to lowercase them. This can be done with the `icu_normalizer` using
88+
Usually, though, you will want to not only normalize the byte order of tokens,
89+
but also lowercase them. This can be done with `icu_normalizer`, using
9190
the custom normalization form `nfkc_cf`, which we discuss in the next section.

0 commit comments

Comments
 (0)