You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've mentioned that, by default, results are returned in descending order of
5
-
relevance.((("relevance", "defined"))) But what is relevance? How is it calculated?
6
4
7
-
The relevance score of each document is represented by a positive floating-point number called the `_score`.((("score", "calculation of"))) The higher the `_score`, the more relevant
8
-
the document.
9
5
10
-
A query clause generates a `_score` for each document. How that score is
11
-
calculated depends on the type of query clause.((("fuzzy queries", "calculation of relevence score"))) Different query clauses are
12
-
used for different purposes: a `fuzzy` query might determine the `_score` by
13
-
calculating how similar the spelling of the found word is to the original
14
-
search term; a `terms` query would incorporate the percentage of terms that
15
-
were found. However, what we usually mean by _relevance_ is the algorithm that we
16
-
use to calculate how similar the contents of a full-text field are to a full-text query string.
The standard _similarity algorithm_ used in Elasticsearch is((("Term Frequency/Inverse Document Frequency (TF/IDF) similarity algorithm")))((("similarity algorithms", "Term Frequency/Inverse Document Frequency (TF/IDF)"))) known as _term
19
-
frequency/inverse document frequency_, or _TF/IDF_, which takes the following
Then it provides the `_explanation`. Each ((("explanation of relevance score calculation")))((("description", "of relevance score calculations")))entry contains a `description`
106
-
that tells you what type of calculation is being performed, a `value`
107
-
that gives you the result of the calculation, and the `details` of any
<1> Summary of the score calculation for `honeymoon`
145
-
<2> Term frequency
146
-
<3> Inverse document frequency
147
-
<4> Field-length norm
140
+
<1> `honeymoon` 相关性评分计算的总结
141
+
<2> 检索词频率
142
+
<3> 反向文档频率
143
+
<4> 字段长度准则
144
+
145
+
WARNING: 输出 `explain` 的代价是昂贵的.((("explain parameter", "overhead of using"))) 它只能用作调试,而不要用于生产环境。
146
+
147
+
148
+
第一部分是关于计算的总结。告诉了我们 文档 `0` 中`honeymoon` 在 `tweet` 字段中的检索词频率/反向文档频率 (TF/IDF)((("weight", "calculation of")))((("Term Frequency/Inverse Document Frequency (TF/IDF) similarity algorithm", "weight calculation for a term")))。(这里的文档 `0` 是一个内部的ID,跟我们没有任何关系,可以忽略)
148
149
149
-
WARNING: Producing the `explain` output is expensive.((("explain parameter", "overhead of using"))) It is a debugging tool
150
-
only. Don't leave it turned on in production.
151
150
152
-
The first part is the summary of the calculation. It tells us that it has
153
-
calculated the _weight_—the ((("weight", "calculation of")))((("Term Frequency/Inverse Document Frequency (TF/IDF) similarity algorithm", "weight calculation for a term")))TF/IDF--of the term `honeymoon` in the field `tweet`, for document `0`. (This is
154
-
an internal document ID and, for our purposes, can be ignored.)
The output from `explain` can be difficult to read in JSON, but it is easier
181
-
when it is formatted as YAML.((("explain parameter", "formatting output in YAML")))((("YAML, formatting explain output in"))) Just add `format=yaml` to the query string.
0 commit comments