Skip to content

Commit 5f8c44f

Browse files
chenrynmedcl
authored andcommitted
chapter47_part3:/520_Post_Deployment/30_indexing_perf.asciidoc (#56)
* chapter47_part3:/520_Post_Deployment/30_indexing_perf.asciidoc * 按照review意见修改
1 parent c654149 commit 5f8c44f

File tree

1 file changed

+53
-129
lines changed

1 file changed

+53
-129
lines changed
Lines changed: 53 additions & 129 deletions
Original file line numberDiff line numberDiff line change
@@ -1,111 +1,64 @@
11
[[indexing-performance]]
2-
=== Indexing Performance Tips
2+
=== 索引性能技巧
33

4-
If you are in an indexing-heavy environment,((("indexing", "performance tips")))((("post-deployment", "indexing performance tips"))) such as indexing infrastructure
5-
logs, you may be willing to sacrifice some search performance for faster indexing
6-
rates. In these scenarios, searches tend to be relatively rare and performed
7-
by people internal to your organization. They are willing to wait several
8-
seconds for a search, as opposed to a consumer facing a search that must
9-
return in milliseconds.
4+
如果你是在一个索引负载很重的环境,((("indexing", "performance tips")))((("post-deployment", "indexing performance tips")))比如索引的是基础设施日志,你可能愿意牺牲一些搜索性能换取更快的索引速率。在这些场景里,搜索常常是很少见的操作,而且一般是由你公司内部的人发起的。他们也愿意为一个搜索等上几秒钟,而不像普通消费者,要求一个搜索必须毫秒级返回。
105

11-
Because of this unique position, certain trade-offs can be made
12-
that will increase your indexing performance.
6+
基于这种特殊的场景,我们可以有几种权衡办法来提高你的索引性能。
137

14-
.These Tips Apply Only to Elasticsearch 1.3+
8+
.这些技巧仅适用于 Elasticsearch 1.3 及以上版本
159
****
16-
This book is written for the most recent versions of Elasticsearch, although much
17-
of the content works on older versions.
10+
本书是为最新几个版本的 Elasticsearch 写的,虽然大多数内容在更老的版本也也有效。
1811
19-
The tips presented in this section, however, are _explicitly_ for version 1.3+. There
20-
have been multiple performance improvements and bugs fixed that directly impact
21-
indexing. In fact, some of these recommendations will _reduce_ performance on
22-
older versions because of the presence of bugs or performance defects.
12+
不过,本节提及的技巧, _只_ 针对 1.3 及以上版本。该版本后有不少性能提升和故障修复是直接影响到索引的。事实上,有些建议在老版本上反而会因为故障或性能缺陷而 _降低_ 性能。
2313
****
2414

25-
==== Test Performance Scientifically
15+
==== 科学的测试性能
2616

27-
Performance testing is always difficult, so try to be as scientific as possible
28-
in your approach.((("performance testing")))((("indexing", "performance tips", "performance testing"))) Randomly fiddling with knobs and turning on ingestion is not
29-
a good way to tune performance. If there are too many _causes_, it is impossible
30-
to determine which one had the best _effect_. A reasonable approach to testing is as follows:
17+
性能测试永远是复杂的,所以在你的方法里已经要尽可能的科学。((("performance testing")))((("indexing", "performance tips", "performance testing")))随机摆弄旋钮以及写入开关可不是做性能调优的好办法。如果有太多种 _可能_ ,我们就无法判断到底哪一种有最好的 _效果_ 。合理的测试方法如下:
3118

32-
1. Test performance on a single node, with a single shard and no replicas.
33-
2. Record performance under 100% default settings so that you have a baseline to
34-
measure against.
35-
3. Make sure performance tests run for a long time (30+ minutes) so you can
36-
evaluate long-term performance, not short-term spikes or latencies. Some events
37-
(such as segment merging, and GCs) won't happen right away, so the performance
38-
profile can change over time.
39-
4. Begin making single changes to the baseline defaults. Test these rigorously,
40-
and if performance improvement is acceptable, keep the setting and move on to the
41-
next one.
19+
1. 在单个节点上,对单个分片,无副本的场景测试性能。
20+
2. 在 100% 默认配置的情况下记录性能结果,这样你就有了一个对比基线。
21+
3. 确保性能测试运行足够长的时间(30 分钟以上)这样你可以评估长期性能,而不是短期的峰值或延迟。一些事件(比如段合并,GC)不会立刻发生,所以性能概况会随着时间继续而改变的。
22+
4. 开始在基线上逐一修改默认值。严格测试它们,如果性能提升可以接受,保留这个配置项,开始下一项。
4223

43-
==== Using and Sizing Bulk Requests
24+
==== 使用批量请求并调整其大小
4425

45-
This should be fairly obvious, but use bulk indexing requests for optimal performance.((("indexing", "performance tips", "bulk requests, using and sizing")))((("bulk API", "using and sizing bulk requests")))
46-
Bulk sizing is dependent on your data, analysis, and cluster configuration, but
47-
a good starting point is 5–15 MB per bulk. Note that this is physical size.
48-
Document count is not a good metric for bulk size. For example, if you are
49-
indexing 1,000 documents per bulk, keep the following in mind:
26+
显而易见的,优化性能应该使用批量请求。((("indexing", "performance tips", "bulk requests, using and sizing")))((("bulk API", "using and sizing bulk requests")))批量的大小则取决于你的数据、分析和集群配置,不过每次批量数据 5–15 MB 大是个不错的起始点。注意这里说的是物理字节数大小。文档计数对批量大小来说不是一个好指标。比如说,如果你每次批量索引 1000 个文档,记住下面的事实:
5027

51-
- 1,000 documents at 1 KB each is 1 MB.
52-
- 1,000 documents at 100 KB each is 100 MB.
28+
- 1000 个 1 KB 大小的文档加起来是 1 MB 大。
29+
- 1000 个 100 KB 大小的文档加起来是 100 MB 大。
5330

54-
Those are drastically different bulk sizes. Bulks need to be loaded into memory
55-
at the coordinating node, so it is the physical size of the bulk that is more
56-
important than the document count.
31+
这可是完完全全不一样的批量大小了。批量请求需要在协调节点上加载进内存,所以批量请求的物理大小比文档计数重要得多。
5732

58-
Start with a bulk size around 5–15 MB and slowly increase it until you do not
59-
see performance gains anymore. Then start increasing the concurrency of your
60-
bulk ingestion (multiple threads, and so forth).
33+
从 5–15 MB 开始测试批量请求大小,缓慢增加这个数字,直到你看不到性能提升为止。然后开始增加你的批量写入的并发度(多线程等等办法)。
6134

62-
Monitor your nodes with Marvel and/or tools such as `iostat`, `top`, and `ps` to see
63-
when resources start to bottleneck. If you start to receive `EsRejectedExecutionException`,
64-
your cluster can no longer keep up: at least one resource has reached capacity. Either reduce concurrency, provide more of the limited resource (such as switching from spinning disks to SSDs), or add more nodes.
35+
用 Marvel 以及诸如 `iostat` 、 `top` 和 `ps` 等工具监控你的节点,观察资源什么时候达到瓶颈。如果你开始收到 `EsRejectedExecutionException` ,你的集群没办法再继续了:至少有一种资源到瓶颈了。或者减少并发数,或者提供更多的受限资源(比如从机械磁盘换成 SSD),或者添加更多节点。
6536

6637
[NOTE]
6738
====
68-
When ingesting data, make sure bulk requests are round-robined across all your
69-
data nodes. Do not send all requests to a single node, since that single node
70-
will need to store all the bulks in memory while processing.
39+
写数据的时候,要确保批量请求是轮询发往你的全部数据节点的。不要把所有请求都发给单个节点,因为这个节点会需要在处理的时候把所有批量请求都存在内存里。
7140
====
7241

73-
==== Storage
42+
==== 存储
7443

75-
Disks are usually the bottleneck of any modern server. Elasticsearch heavily uses disks, and the more throughput your disks can handle, the more stable your nodes will be. Here are some tips for optimizing disk I/O:
44+
磁盘在现代服务器上通常都是瓶颈。Elasticsearch 重度使用磁盘,你的磁盘能处理的吞吐量越大,你的节点就越稳定。这里有一些优化磁盘 I/O 的技巧:
7645

77-
- Use SSDs. As mentioned elsewhere, ((("storage")))((("indexing", "performance tips", "storage")))they are superior to spinning media.
78-
- Use RAID 0. Striped RAID will increase disk I/O, at the obvious expense of
79-
potential failure if a drive dies. Don't use mirrored or parity RAIDS since
80-
replicas provide that functionality.
81-
- Alternatively, use multiple drives and allow Elasticsearch to stripe data across
82-
them via multiple `path.data` directories.
83-
- Do not use remote-mounted storage, such as NFS or SMB/CIFS. The latency introduced
84-
here is antithetical to performance.
85-
- If you are on EC2, beware of EBS. Even the SSD-backed EBS options are often slower
86-
than local instance storage.
46+
- 使用 SSD。就像其他地方提过的,((("storage")))((("indexing", "performance tips", "storage")))他们比机械磁盘优秀多了。
47+
- 使用 RAID 0。条带化 RAID 会提高磁盘 I/O,代价显然就是当一块硬盘故障时整个就故障了。不要使用镜像或者奇偶校验 RAID 因为副本已经提供了这个功能。
48+
- 另外,使用多块硬盘,并允许 Elasticsearch 通过多个 `path.data` 目录配置把数据条带化分配到它们上面。
49+
- 不要使用远程挂载的存储,比如 NFS 或者 SMB/CIFS。这个引入的延迟对性能来说完全是背道而驰的。
50+
- 如果你用的是 EC2,当心 EBS。即便是基于 SSD 的 EBS,通常也比本地实例的存储要慢。
8751

8852
[[segments-and-merging]]
89-
==== Segments and Merging
53+
==== 段和合并
9054

91-
Segment merging is computationally expensive,((("indexing", "performance tips", "segments and merging")))((("merging segments")))((("segments", "merging"))) and can eat up a lot of disk I/O.
92-
Merges are scheduled to operate in the background because they can take a long
93-
time to finish, especially large segments. This is normally fine, because the
94-
rate of large segment merges is relatively rare.
55+
段合并的计算量庞大,((("indexing", "performance tips", "segments and merging")))((("merging segments")))((("segments", "merging")))而且还要吃掉大量磁盘 I/O。合并在后台定期操作,因为他们可能要很长时间才能完成,尤其是比较大的段。这个通常来说都没问题,因为大规模段合并的概率是很小的。
9556

96-
But sometimes merging falls behind the ingestion rate. If this happens, Elasticsearch
97-
will automatically throttle indexing requests to a single thread. This prevents
98-
a _segment explosion_ problem, in which hundreds of segments are generated before
99-
they can be merged. Elasticsearch will log `INFO`-level messages stating `now
100-
throttling indexing` when it detects merging falling behind indexing.
57+
不过有时候合并会拖累写入速率。如果这个真的发生了,Elasticsearch 会自动限制索引请求到单个线程里。这个可以防止出现 _段爆炸_ 问题,即数以百计的段在被合并之前就生成出来。如果 Elasticsearch 发现合并拖累索引了,它会会记录一个声明有 `now throttling indexing` 的 `INFO` 级别信息。
10158

102-
Elasticsearch defaults here are conservative: you don't want search performance
103-
to be impacted by background merging. But sometimes (especially on SSD, or logging
104-
scenarios), the throttle limit is too low.
59+
Elasticsearch 默认设置在这块比较保守:不希望搜索性能被后台合并影响。不过有时候(尤其是 SSD,或者日志场景)限流阈值太低了。
10560

106-
The default is 20 MB/s, which is a good setting for spinning disks. If you have
107-
SSDs, you might consider increasing this to 100–200 MB/s. Test to see what works
108-
for your system:
61+
默认值是 20 MB/s,对机械磁盘应该是个不错的设置。如果你用的是 SSD,可以考虑提高到 100–200 MB/s。测试验证对你的系统哪个值合适:
10962

11063
[source,js]
11164
----
@@ -117,9 +70,7 @@ PUT /_cluster/settings
11770
}
11871
----
11972

120-
If you are doing a bulk import and don't care about search at all, you can disable
121-
merge throttling entirely. This will allow indexing to run as fast as your
122-
disks will allow:
73+
如果你在做批量导入,完全不在意搜索,你可以彻底关掉合并限流。这样让你的索引速度跑到你磁盘允许的极限:
12374

12475
[source,js]
12576
----
@@ -130,58 +81,31 @@ PUT /_cluster/settings
13081
}
13182
}
13283
----
133-
<1> Setting the throttle type to `none` disables merge throttling entirely. When
134-
you are done importing, set it back to `merge` to reenable throttling.
84+
<1> 设置限流类型为 `none` 彻底关闭合并限流。等你完成了导入,记得改回 `merge` 重新打开限流。
13585

136-
If you are using spinning media instead of SSD, you need to add this to your
137-
`elasticsearch.yml`:
86+
如果你使用的是机械磁盘而非 SSD,你需要添加下面这个配置到你的 `elasticsearch.yml` 里:
13887

13988
[source,yaml]
14089
----
14190
index.merge.scheduler.max_thread_count: 1
14291
----
14392

144-
Spinning media has a harder time with concurrent I/O, so we need to decrease
145-
the number of threads that can concurrently access the disk per index. This setting
146-
will allow `max_thread_count + 2` threads to operate on the disk at one time,
147-
so a setting of `1` will allow three threads.
148-
149-
For SSDs, you can ignore this setting. The default is
150-
`Math.min(3, Runtime.getRuntime().availableProcessors() / 2)`, which works well
151-
for SSD.
152-
153-
Finally, you can increase `index.translog.flush_threshold_size` from the default
154-
512 MB to something larger, such as 1 GB. This allows larger segments to accumulate
155-
in the translog before a flush occurs. By letting larger segments build, you
156-
flush less often, and the larger segments merge less often. All of this adds up
157-
to less disk I/O overhead and better indexing rates. Of course, you will need
158-
the corresponding amount of heap memory free to accumulate the extra buffering
159-
space, so keep that in mind when adjusting this setting.
160-
161-
==== Other
162-
163-
Finally, there are some other considerations to keep in mind:
164-
165-
- If you don't need near real-time accuracy on your search results, consider
166-
dropping the `index.refresh_interval` of((("indexing", "performance tips", "other considerations")))((("refresh_interval setting"))) each index to `30s`. If you are doing
167-
a large import, you can disable refreshes by setting this value to `-1` for the
168-
duration of the import. Don't forget to reenable it when you are finished!
169-
170-
- If you are doing a large bulk import, consider disabling replicas by setting
171-
`index.number_of_replicas: 0`.((("replicas, disabling during large bulk imports"))) When documents are replicated, the entire document
172-
is sent to the replica node and the indexing process is repeated verbatim. This
173-
means each replica will perform the analysis, indexing, and potentially merging
174-
process.
93+
机械磁盘在并发 I/O 支持方面比较差,所以我们需要降低每个索引并发访问磁盘的线程数。这个设置允许 `max_thread_count + 2` 个线程同时进行磁盘操作,也就是设置为 `1` 允许三个线程。
94+
95+
对于 SSD,你可以忽略这个设置,默认是 `Math.min(3, Runtime.getRuntime().availableProcessors() / 2)` ,对 SSD 来说运行的很好。
96+
97+
最后,你可以增加 `index.translog.flush_threshold_size` 设置,从默认的 512 MB 到更大一些的值,比如 1 GB。这可以在一次清空触发的时候在事务日志里积累出更大的段。而通过构建更大的段,清空的频率变低,大段合并的频率也变低。这一切合起来导致更少的磁盘 I/O 开销和更好的索引速率。当然,你会需要对应量级的 heap 内存用以积累更大的缓冲空间,调整这个设置的时候请记住这点。
98+
99+
==== 其他
100+
101+
最后,还有一些其他值得考虑的东西需要记住:
102+
103+
- 如果你的搜索结果不需要近实时的准确度,考虑把每个索引的 `index.refresh_interval`((("indexing", "performance tips", "other considerations")))((("refresh_interval setting")))改到 `30s` 。如果你是在做大批量导入,导入期间你可以通过设置这个值为 `-1` 关掉刷新。别忘记在完工的时候重新开启它。
104+
105+
- 如果你在做大批量导入,考虑通过设置 `index.number_of_replicas: 0`((("replicas, disabling during large bulk imports")))关闭副本。文档在复制的时候,整个文档内容都被发往副本节点,然后逐字的把索引过程重复一遍。这意味着每个副本也会执行分析、索引以及可能的合并过程。
175106
+
176-
In contrast, if you index with zero replicas and then enable replicas when ingestion
177-
is finished, the recovery process is essentially a byte-for-byte network transfer.
178-
This is much more efficient than duplicating the indexing process.
179-
180-
- If you don't have a natural ID for each document, use Elasticsearch's auto-ID
181-
functionality.((("id", "auto-ID functionality of Elasticsearch"))) It is optimized to avoid version lookups, since the autogenerated
182-
ID is unique.
183-
184-
- If you are using your own ID, try to pick an ID that is http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html[friendly to Lucene]. ((("UUIDs (universally unique identifiers)"))) Examples include zero-padded
185-
sequential IDs, UUID-1, and nanotime; these IDs have consistent, sequential
186-
patterns that compress well. In contrast, IDs such as UUID-4 are essentially
187-
random, which offer poor compression and slow down Lucene.
107+
相反,如果你的索引是零副本,然后在写入完成后再开启副本,恢复过程本质上只是一个字节到字节的网络传输。相比重复索引过程,这个算是相当高效的了。
108+
109+
- 如果你没有给每个文档自带 ID,使用 Elasticsearch 的自动 ID 功能。((("id", "auto-ID functionality of Elasticsearch")))这个为避免版本查找做了优化,因为自动生成的 ID 是唯一的。
110+
111+
- 如果你在使用自己的 ID,尝试使用一种 http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html[Lucene 友好的] ID。((("UUIDs (universally unique identifiers)")))包括零填充序列 ID、UUID-1 和纳秒;这些 ID 都是有一致的,压缩良好的序列模式。相反的,像 UUID-4 这样的 ID,本质上是随机的,压缩比很低,会明显拖慢 Lucene。

0 commit comments

Comments
 (0)