chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc #294

pengqiuyuan · 2016-09-13T07:26:02Z

初译

…h-definitive-guide into chapter22_part21

修改

添加第一行

qindongliang

大伙看看有不准确的地方，欢迎指正

qindongliang · 2016-11-13T04:01:59Z

300_Aggregations/120_breadth_vs_depth.asciidoc

@@ -1,15 +1,10 @@

-=== Preventing Combinatorial Explosions
+=== 避免组合爆炸（Preventing Combinatorial Explosions）


改为下面是否更容易理解？
优化聚合查询

qindongliang · 2016-11-13T04:14:58Z

300_Aggregations/120_breadth_vs_depth.asciidoc

@@ -1,15 +1,10 @@

-=== Preventing Combinatorial Explosions
+=== 避免组合爆炸（Preventing Combinatorial Explosions）



开头地方可以加个译者注，为了方便读者理解es里面bucket的概念：
es里面bucket的叫法和SQL里面分组的概念是类似的，一个bucket就类似SQL里面的一个group
多级嵌套的aggregation，类似SQL里面的多字段分组（group by field1, field2, .....）
注意这里仅仅是概念类似，底层的实现原理是不一样的。

qindongliang · 2016-11-13T04:19:10Z

300_Aggregations/120_breadth_vs_depth.asciidoc

-another aggregation, which contains another aggregation, and so forth. The combination of
-unique values in each of these aggregations can lead to an explosion in the
-number of buckets generated.
+`terms` 桶基于我们的数据动态构建桶；它并不知道到底生成了多少桶。((("combinatorial explosions, preventing")))((("aggregations", "preventing combinatorial explosions"))) 尽管这对单个聚合还行，


尽管这对单个聚合还行
改为下面如何？
大多数时候对单个字段的聚合查询还是非常快的

qindongliang · 2016-11-13T04:27:04Z

300_Aggregations/120_breadth_vs_depth.asciidoc

-unique values in each of these aggregations can lead to an explosion in the
-number of buckets generated.
+`terms` 桶基于我们的数据动态构建桶；它并不知道到底生成了多少桶。((("combinatorial explosions, preventing")))((("aggregations", "preventing combinatorial explosions"))) 尽管这对单个聚合还行，
+但考虑当一个聚合包含另外一个聚合，这样一层又一层的时候会发生什么。合并每个聚合的唯一值会导致它随着生成桶的数量而发生爆炸。


但是当需要同时聚合多个字段时，就可能会产生大量的分组，最终结果就是占用es大量内存，从而导致OOM的情况发生。

qindongliang · 2016-11-13T04:30:49Z

300_Aggregations/120_breadth_vs_depth.asciidoc


-Imagine we have a modest dataset that represents movies.  Each document lists
-the actors in that movie:
+设想我们有一个表示影片大小适度的数据集合。每个文档都列出了影片的演员：


假设我们现在有一些关于电影的数据集，每条数据里面会有一个数组类型的字段存储表演该电影的所有演员的名字。

qindongliang · 2016-11-13T05:20:35Z

300_Aggregations/120_breadth_vs_depth.asciidoc

-combinatorial explosion of buckets and drastically reduces memory requirements
-for classes of queries that are amenable to breadth-first. 
+因为我们已经知道了前十名演员，我们可以安全的修剪其他节点。修剪后，下一层是基于 _它的_ 执行模式读入的，重复执行这个过程直到聚合完成，如图 <<breadth-first-4>> 所示。
+这就可以避免那种适于使用广度优先策略的查询，因为组合而导致桶的爆炸增长和内存急剧降低的问题。


这就可以避免那种适于使用广度优先策略的查询，因为组合而导致桶的爆炸增长和内存急剧降低的问题
改为
这种场景下，广度优先可以大幅度节省内存。

qindongliang · 2016-11-13T05:27:49Z

300_Aggregations/120_breadth_vs_depth.asciidoc

-buckets, but each bucket has relatively few documents--breadth-first is much
-more memory efficient, and allows you to build aggregations that would
-otherwise fail.
+广度优先只有在当桶内的文档比可能生成的桶多时才应该被用到。深度搜索在桶层对文档数据缓存，然后在修剪阶段后的子聚合过程中再次使用这些文档缓存。


广度优先只有在当桶内的文档比可能生成的桶多时才应该被用到。深度搜索在桶层对文档数据缓存，然后在修剪阶段后的子聚合过程中再次使用这些文档缓存。
更改为下面的是否更好？
广度优先仅仅适用于每个组的聚合数量远远小于当前总组数的情况下，因为广度优先会在内存中缓存裁剪后的仅仅需要缓存的每个组的所有数据，以便于它的子聚合分组查询可以复用上级聚合的数据。

qindongliang · 2016-11-13T05:34:49Z

300_Aggregations/120_breadth_vs_depth.asciidoc


+在修剪之前，广度优先聚合对于内存的需求与每个桶内的文档数量成线性关系。对于很多聚合来说，每个桶内的文档数量是相当大的。


在修剪之前，广度优先聚合对于内存的需求与每个桶内的文档数量成线性关系
更改为下面的是否更好？
广度优先的内存使用情况与裁剪后的缓存分组数据量是成线性的

qindongliang · 2016-11-13T05:38:44Z

300_Aggregations/120_breadth_vs_depth.asciidoc


+在修剪之前，广度优先聚合对于内存的需求与每个桶内的文档数量成线性关系。对于很多聚合来说，每个桶内的文档数量是相当大的。
+想象一个以月为间隔的直方图：每个桶内可能有数以亿计的文档。这使广度优先不是一个好的选择，这也是为什么深度优先作为默认策略的原因。


+想象一个以月为间隔的直方图：每个桶内可能有数以亿计的文档。
更改为
想象一种按月分组的直方图，总组数肯定是固定的，因为每年只有12个月，这个时候每个月下的数据量可能非常大

qindongliang · 2016-11-13T05:51:47Z

300_Aggregations/120_breadth_vs_depth.asciidoc


+但对于演员的示例，默认聚合生成大量的桶，但每个桶内的文档相对较少，而广度优先的内存效率更高。如果不是这样，我们构建的聚合要不然就会失败。


但对于演员的示例，默认聚合生成大量的桶，但每个桶内的文档相对较少，而广度优先的内存效率更高。如果不是这样，我们构建的聚合要不然就会失败。
更改为
针对上面演员的例子，如果数据量越大，那么默认的使用深度优先的聚合模式生成的总分组数就会非常多，但是预估二级的聚合字段分组后的数据量相比总的分组数会小很多所以这种情况下使用广度优先的模式能大大节省内存，从而通过优化聚合模式来大大提高了在某些特定场景下聚合查询的成功率。

@qindongliang

按 review 意见修改。@qindongliang

qindongliang · 2016-11-13T07:04:52Z

LGMT

@qindongliang

按 review 意见修改。@qindongliang

medcl · 2016-11-15T15:36:28Z

300_Aggregations/120_breadth_vs_depth.asciidoc

-co-stars, totaling 50 values.  To get the final results, we have to generate
-that tree of 2,000,000 buckets, sort it, and finally prune it such that only the
-top 10 actors are left. This is illustrated in <<depth-first-2>> and <<depth-first-3>>.
+No现在，记住，聚合只是简单的希望得到前十位演员和与他们联合出演者，总共 50 条数据。为了得到最终的结果，我们创建了一个有 2，000，000 桶的树，然后对其排序，取 top10。


开头的No

medcl · 2016-11-15T15:39:40Z

300_Aggregations/120_breadth_vs_depth.asciidoc

@@ -1,15 +1,13 @@
+[[_preventing_combinatorial_explosions]]
+=== 优化聚合查询（Preventing Combinatorial Explosions）


标题有点长

按 review 意见修改。1、移除标题括号中的英文内容。2、移除 No

medcl · 2016-11-22T05:11:45Z

LGTM

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc

e06435e

初译

pengqiuyuan assigned xlows-1227 Sep 13, 2016

pengqiuyuan added the to be review label Sep 13, 2016

pengqiuyuan added 3 commits October 9, 2016 14:38

Merge branch 'cn' of https://github.com/elasticsearch-cn/elasticsearc…

a7602b2

…h-definitive-guide into chapter22_part21

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc

6ab13d2

修改

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc

62336b9

添加第一行

qindongliang self-assigned this Nov 13, 2016

qindongliang reviewed Nov 13, 2016

View reviewed changes

qindongliang added to be final review and removed to be review labels Nov 13, 2016

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc

779723e

按 review 意见修改。@qindongliang

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc

3d633da

按 review 意见修改。@qindongliang

medcl reviewed Nov 15, 2016

View reviewed changes

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc

915a946

按 review 意见修改。1、移除标题括号中的英文内容。2、移除 No

medcl added to be merge and removed to be final review labels Nov 22, 2016

medcl merged commit a37ed90 into elasticsearch-cn:cn Nov 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc #294

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc #294

Uh oh!

pengqiuyuan commented Sep 13, 2016

Uh oh!

qindongliang left a comment

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang Nov 13, 2016

Uh oh!

qindongliang commented Nov 13, 2016

Uh oh!

medcl Nov 15, 2016

Uh oh!

medcl Nov 15, 2016

Uh oh!

medcl commented Nov 22, 2016

Uh oh!

Uh oh!

		@@ -1,15 +1,10 @@

		=== Preventing Combinatorial Explosions
		=== 避免组合爆炸（Preventing Combinatorial Explosions）


		在修剪之前，广度优先聚合对于内存的需求与每个桶内的文档数量成线性关系。对于很多聚合来说，每个桶内的文档数量是相当大的。


		在修剪之前，广度优先聚合对于内存的需求与每个桶内的文档数量成线性关系。对于很多聚合来说，每个桶内的文档数量是相当大的。
		想象一个以月为间隔的直方图：每个桶内可能有数以亿计的文档。这使广度优先不是一个好的选择，这也是为什么深度优先作为默认策略的原因。

		@@ -1,15 +1,13 @@
		[[_preventing_combinatorial_explosions]]
		=== 优化聚合查询（Preventing Combinatorial Explosions）

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc #294

chapter22_part21:/300_Aggregations/120_breadth_vs_depth.asciidoc #294

Uh oh!

Conversation

pengqiuyuan commented Sep 13, 2016

Uh oh!

qindongliang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qindongliang commented Nov 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

medcl commented Nov 22, 2016

Uh oh!

Uh oh!