|
1 |
| -[[fielddata-intro]] |
2 |
| -=== Fielddata |
| 1 | +[[字段数据介绍]] |
| 2 | +=== 字段数据 |
3 | 3 |
|
4 |
| -Our final topic in this chapter is about an internal aspect of Elasticsearch. |
5 |
| -While we don't demonstrate any new techniques here, fielddata is an |
6 |
| -important topic that we will refer to repeatedly, and is something that you |
7 |
| -should be aware of.((("fielddata"))) |
8 | 4 |
|
9 |
| -When you sort on a field, Elasticsearch needs access to the value of that |
10 |
| -field for every document that matches the query.((("inverted index", "sorting and"))) The inverted index, which |
11 |
| -performs very well when searching, is not the ideal structure for sorting on |
12 |
| -field values: |
13 | 5 |
|
14 |
| -* When searching, we need to be able to map a term to a list of documents. |
| 6 | +我们这章的终极目标是关于Elasticsearch的一个内部的方面,且我们在这里并不会阐述任何新的技术,字段数据是我们将会重复提到的一个重要话题,并且你应当明确它。((("fielddata"))) |
15 | 7 |
|
16 |
| -* When sorting, we need to map a document to its terms. In other words, we |
17 |
| - need to ``uninvert'' the inverted index. |
18 | 8 |
|
19 |
| -To make sorting efficient, Elasticsearch loads all the values for |
20 |
| -the field that you want to sort on into memory. This is referred to as |
21 |
| -_fielddata_. |
22 | 9 |
|
23 |
| -WARNING: Elasticsearch doesn't just load the values for the documents that matched a |
24 |
| -particular query. It loads the values from _every document in your index_, |
25 |
| -regardless of the document `type`. |
| 10 | +当你以字段进行排序, Elasticsearch需要访问符合查询的每个文档的该字段的值。((("inverted index", "sorting and")))反转的索引(这会对搜索更加友好)在以字段值排序时不是理想的结构。 |
26 | 11 |
|
27 |
| -The reason that Elasticsearch loads all values into memory is that uninverting the index |
28 |
| -from disk is slow. Even though you may need the values for only a few docs |
29 |
| -for the current request, you will probably need access to the values for other |
30 |
| -docs on the next request, so it makes sense to load all the values into memory |
31 |
| -at once, and to keep them there. |
32 | 12 |
|
33 |
| -Fielddata is used in several places in Elasticsearch: |
| 13 | +* 当搜索时,我们需要能将一个文档列表映射到某一项上。 |
34 | 14 |
|
35 |
| -* Sorting on a field |
36 |
| -* Aggregations on a field |
37 |
| -* Certain filters (for example, geolocation filters) |
38 |
| -* Scripts that refer to fields |
39 | 15 |
|
40 |
| -Clearly, this can consume a lot of memory, especially for high-cardinality |
41 |
| -string fields--string fields that have many unique values--like the body |
42 |
| -of an email. Fortunately, insufficient memory is a problem that can be solved |
43 |
| -by horizontal scaling, by adding more nodes to your cluster. |
44 | 16 |
|
45 |
| -For now, all you need to know is what fielddata is, and to be aware that it |
46 |
| -can be memory hungry. Later, we will show you how to determine the amount of memory that fielddata |
47 |
| -is using, how to limit the amount of memory that is available to it, and |
48 |
| -how to preload fielddata to improve the user experience. |
| 17 | +* 当排序时, 我们需要映射一个文档到它的某项。 换句话说, 我们需要 ``反向反转`` 已经反转的索引。 |
49 | 18 |
|
50 | 19 |
|
51 | 20 |
|
52 | 21 |
|
| 22 | +为了使得排序效率更高, Elasticsearch 会在内存中加载你想要以之排序的所有字段的值。 这便是提到的 _字段数据_ 。 |
| 23 | + |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +WARNING: Elasticsearch 并不仅仅加载匹配特定查询的文档的值。 他会加载 _你的数据库中的每个文档_ , 无论这个文档的 `type` |
| 28 | + |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | +Elasticsearch在内存中加载所有的值的原因是在硬盘中逆反向索引是很慢的。虽然你当前的请求可能仅仅需要很少文档的值,你仍然可能在下次请求时需要可以访问其他文档的值,所以在内存中立即加载所有的值并驻留是有意义的。 |
| 33 | + |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +字段数据在Elasticsearch中被用于以下地方: |
| 38 | + |
| 39 | +* 按照字段排序 |
| 40 | +* 按照字段聚合 |
| 41 | +* 一些特定的筛选(例如,地理筛选) |
| 42 | +* 引入字段的脚本 |
| 43 | +
|
| 44 | +
|
| 45 | +显然的,这会消耗大量的内存,特别是对于高基数的字符串字段--字符串字段有很多独特的值--例如email的body体。幸运的是,内存效率低的问题可以通过增加集群的节点进行水平扩展来解决。 |
| 46 | + |
| 47 | +现在,所有你需要知道和明确的是它是极度需要内存的。稍后,我们会给你演示如何确定字段数据所占用的内存,如何限制可用的内存,和如何预加载字段数据来提高用户体验。 |
| 48 | + |
| 49 | + |
53 | 50 |
|
54 | 51 |
|
0 commit comments