elasticsearch-cn · medcl · Nov 22, 2016 · Sep 13, 2016 · Oct 9, 2016 · Oct 9, 2016
diff --git a/300_Aggregations/120_breadth_vs_depth.asciidoc b/300_Aggregations/120_breadth_vs_depth.asciidoc
@@ -1,15 +1,10 @@
 
-=== Preventing Combinatorial Explosions
+=== 避免组合爆炸（Preventing Combinatorial Explosions）
 
-The `terms` bucket dynamically builds buckets based on your data; it doesn't
-know up front how many buckets will be generated. ((("combinatorial explosions, preventing")))((("aggregations", "preventing combinatorial explosions"))) While this is fine with a
-single aggregation, think about what can happen when one aggregation contains
-another aggregation, which contains another aggregation, and so forth. The combination of
-unique values in each of these aggregations can lead to an explosion in the
-number of buckets generated.
+`terms` 桶基于我们的数据动态构建桶；它并不知道到底生成了多少桶。((("combinatorial explosions, preventing")))((("aggregations", "preventing combinatorial explosions"))) 尽管这对单个聚合还行，
+但考虑当一个聚合包含另外一个聚合，这样一层又一层的时候会发生什么。合并每个聚合的唯一值会导致它随着生成桶的数量而发生爆炸。
 
-Imagine we have a modest dataset that represents movies.  Each document lists
-the actors in that movie:
+设想我们有一个表示影片大小适度的数据集合。每个文档都列出了影片的演员：
 
 [source,js]
 ----
@@ -22,8 +17,7 @@ the actors in that movie:
 }
 ----
 
-If we want to determine the top 10 actors and their top costars, that's trivial
-with an aggregation:
+如果我们想要确定出演影片最多的十个演员以及与他们合作最多的演员，使用聚合并不算什么：
 
 [source,js]
 ----
@@ -47,28 +41,19 @@ with an aggregation:
 }
 ----
 
-This will return a list of the top 10 actors, and for each actor, a list of their
-top five costars.  This seems like a very modest aggregation; only 50
-values will be returned!
+这会返回前十位出演最多的演员，以及与他们合作最多的五位演员。这似乎是个不大的聚合，只返回 50 个值！
 
-However, this seemingly ((("aggregations", "fielddata", "datastructure overview")))innocuous query can easily consume a vast amount of
-memory. You can visualize a `terms` aggregation as building a tree in memory.
-The `actors` aggregation will build the first level of the tree, with a bucket
-for every actor.  Then, nested under each node in the first level, the
-`costars` aggregation will build a second level, with a bucket for every costar, as seen in <<depth-first-1>>. That means that a single movie will generate n^2^ buckets!
+但是，((("aggregations", "fielddata", "datastructure overview"))) 这个看上去无伤大雅的查询可以轻而易举地消耗大量内存，我们可以通过在内存中构建一个树来查看这个 `terms` 聚合。
+ `actors` 聚合会构建树的第一层，每个演员都有一个桶。然后，内套在第一层的每个节点之下， `costar` 聚合会构建第二层，每个联合出演一个桶，请参见 <<depth-first-1>> 所示。这意味着每部影片会生成 n^2^ 个桶！
 
 [[depth-first-1]]
 .Build full depth tree
 image::images/300_120_depth_first_1.svg["Build full depth tree"]
 
-To use some real numbers, imagine each movie has 10 actors on average. Each movie
-will then generate 10^2^ == 100 buckets.  If you have 20,000 movies, that's
-roughly 2,000,000 generated buckets.
+用真实点的数字，设想平均每部影片有 10 名演员，每部影片就会生成 10^2^ == 100 个桶。如果总共有 20，000 部影片，粗率计算就会生成 2，000，000 个桶。
 
-Now, remember, our aggregation is simply asking for the top 10 actors and their
-co-stars, totaling 50 values.  To get the final results, we have to generate
-that tree of 2,000,000 buckets, sort it, and finally prune it such that only the
-top 10 actors are left. This is illustrated in <<depth-first-2>> and <<depth-first-3>>.
+No现在，记住，聚合只是简单的希望得到前十位演员和与他们联合出演者，总共 50 个值。为了得到最终的结果，我们创建了一个有 2，000，000 桶的树，然后对其排序，最后将结果减少到前 10 位演员。
+图 <<depth-first-2>> 和图 <<depth-first-3>> 对这个过程进行了阐述。
 
 [[depth-first-2]]
 .Sort tree
@@ -78,30 +63,19 @@ image::images/300_120_depth_first_2.svg["Sort tree"]
 .Prune tree
 image::images/300_120_depth_first_3.svg["Prune tree"]
 
-At this point you should be quite distraught.  Twenty thousand documents is paltry,
-and the aggregation is pretty tame.  What if you had 200 million documents, wanted
-the top 100 actors and their top 20 costars, as well as the costars' costars?
+这时我们一定非常抓狂，2 万文档虽然微不足道，但是聚合也不轻松。如果我们有 2 亿文档，想要得到前 100 位演员以及与他们合作最多的 20 位演员，以及合作者的合作者会怎样？
 
-You can appreciate how quickly combinatorial expansion can grow, making this
-strategy untenable.  There is not enough memory in the world to support uncontrolled
-combinatorial explosions.
+可以判断组合扩大快速增长会使这种策略难以维持。世界上并不存在足够的内存来支持这种非受控状态下的组合爆炸。
 
 ==== Depth-First Versus Breadth-First
 
-Elasticsearch allows you to change the _collection mode_ of an aggregation, for
-exactly this situation. ((("collection mode"))) ((("aggregations", "preventing combinatorial explosions", "depth-first versus breadth-first")))The strategy we outlined previously--building the tree fully
-and then pruning--is called _depth-first_ and it is the default. ((("depth-first collection strategy"))) Depth-first
-works well for the majority of aggregations, but can fall apart in situations
-like our actors and costars example.
+Elasticsearch 允许我们改变聚合的 _集合模式_ ，就是为了应对这种状况。((("collection mode"))) ((("aggregations", "preventing combinatorial explosions", "depth-first versus breadth-first")))
+我们之前展示的策略叫做 _深度优先_ ，它是默认设置，((("depth-first collection strategy"))) 先构建完整的树，然后修剪无用节点。 _深度优先_ 的方式对于大多数聚合都能正常工作，但对于如我们演员和联合演员这样例子的情形就不太适用。
 
-For these special cases, you should use an alternative collection strategy called
-_breadth-first_.  ((("beadth-first collection strategy")))This strategy works a little differently.  It executes the first
-layer of aggregations, and _then_ performs a pruning phase before continuing, as illustrated in <<breadth-first-1>> through <<breadth-first-3>>.
+为了应对这些特殊的应用场景，我们应该使用另一种集合策略叫做 _广度优先_ 。((("beadth-first collection strategy")))这种策略的工作方式有些不同，它先执行第一层聚合， _再_ 继续下一层聚合之前会先做修剪。
+图 <<breadth-first-1>> 和图 <<breadth-first-3>> 对这个过程进行了阐述。
 
-In our example, the `actors` aggregation would be executed first.  At this
-point, we have a single layer in the tree, but we already know who the top 10
-actors are! There is no need to keep the other actors since they won't be in
-the top 10 anyway. 
+在我们的示例中， `actors` 聚合会首先执行，在这个时候，我们的树只有一层，但我们已经知道了前 10 位的演员！这就没有必要保留其他的演员信息，因为它们无论如何都不会出现在前十位中。
 
 [[breadth-first-1]]
 .Build first level
@@ -115,17 +89,14 @@ image::images/300_120_breadth_first_2.svg["Sort first level"]
 .Prune first level
 image::images/300_120_breadth_first_3.svg["Prune first level"]
 
-Since we already know the top ten actors, we can safely prune away the rest of the
-long tail. After pruning, the next layer is populated based on _its_ execution mode,
-and the process repeats until the aggregation is done, as illustrated in <<breadth-first-4>>. This prevents the
-combinatorial explosion of buckets and drastically reduces memory requirements
-for classes of queries that are amenable to breadth-first. 
+因为我们已经知道了前十名演员，我们可以安全的修剪其他节点。修剪后，下一层是基于 _它的_ 执行模式读入的，重复执行这个过程直到聚合完成，如图 <<breadth-first-4>> 所示。
+这就可以避免那种适于使用广度优先策略的查询，因为组合而导致桶的爆炸增长和内存急剧降低的问题。
 
 [[breadth-first-4]]
 .Populate full depth for remaining nodes
 image::images/300_120_breadth_first_4.svg["Step 4: populate full depth for remaining nodes"]
 
-To use breadth-first, simply ((("collect parameter, enabling breadth-first")))enable it via the `collect` parameter:
+要使用广度优先，只需简单 ((("collect parameter, enabling breadth-first"))) 的通过参数 `collect` 开启：
 
 [source,js]
 ----
@@ -149,23 +120,11 @@ To use breadth-first, simply ((("collect parameter, enabling breadth-first")))en
   }
 }
 ----
-<1> Enable `breadth_first` on a per-aggregation basis.
+<1> 按聚合来开启 `breadth_first` 。
 
-Breadth-first should be used only when you expect more buckets to be generated
-than documents landing in the buckets.  Breadth-first works by caching
-document data at the bucket level, and then replaying those documents to child
-aggregations after the pruning phase.
-
-The memory requirement of a breadth-first aggregation is linear to the number
-of documents in each bucket prior to pruning.  For many aggregations, the
-number of documents in each bucket is very large.  Think of a histogram with
-monthly intervals: you might have thousands or hundreds of thousands of
-documents per bucket.  This makes breadth-first a bad choice, and is why
-depth-first is the default.
-
-But for the actor example--which generates a large number of
-buckets, but each bucket has relatively few documents--breadth-first is much
-more memory efficient, and allows you to build aggregations that would
-otherwise fail.
+广度优先只有在当桶内的文档比可能生成的桶多时才应该被用到。深度搜索在桶层对文档数据缓存，然后在修剪阶段后的子聚合过程中再次使用这些文档缓存。
 
+在修剪之前，广度优先聚合对于内存的需求与每个桶内的文档数量成线性关系。对于很多聚合来说，每个桶内的文档数量是相当大的。
+想象一个以月为间隔的直方图：每个桶内可能有数以亿计的文档。这使广度优先不是一个好的选择，这也是为什么深度优先作为默认策略的原因。
 
+但对于演员的示例，默认聚合生成大量的桶，但每个桶内的文档相对较少，而广度优先的内存效率更高。如果不是这样，我们构建的聚合要不然就会失败。