@@ -71,21 +71,20 @@ PUT /my_index
71
71
<1> Normalize all tokens into the `nfkc` normalization form.
72
72
73
73
[TIP]
74
- .When to normalize
75
74
==================================================
76
75
77
- Besides the `icu_normalizer` token filter mentioned above , there is also an
78
- `icu_normalizer` *character* filter, which((("icu_normalizer character filter"))) does the same job as the token
79
- filter, but it does it before the text reaches the tokenizer. When using the
76
+ Besides the `icu_normalizer` token filter mentioned previously , there is also an
77
+ `icu_normalizer` _character_ filter, which((("icu_normalizer character filter"))) does the same job as the token
78
+ filter, but does so before the text reaches the tokenizer. When using the
80
79
`standard` tokenizer or `icu_tokenizer`, this doesn't really matter. These
81
80
tokenizers know how to deal with all forms of Unicode correctly.
82
81
83
82
However, if you plan on using a different tokenizer, such as the `ngram`,
84
- `edge_ngram` or `pattern` tokenizers, then it woud make sense to use the
83
+ `edge_ngram`, or `pattern` tokenizers, it would make sense to use the
85
84
`icu_normalizer` character filter in preference to the token filter.
86
85
87
86
==================================================
88
87
89
- Usually, though, not only will you want to normalize the byte order of tokens,
90
- but also to lowercase them. This can be done with the `icu_normalizer` using
88
+ Usually, though, you will want to not only normalize the byte order of tokens,
89
+ but also lowercase them. This can be done with `icu_normalizer`, using
91
90
the custom normalization form `nfkc_cf`, which we discuss in the next section.
0 commit comments