|
1 | 1 | [[hardware]]
|
2 |
| -=== Hardware |
3 |
| - |
4 |
| -If you've been following the normal development path, you've probably been playing((("deployment", "hardware")))((("hardware"))) |
5 |
| -with Elasticsearch on your laptop or on a small cluster of machines laying around. |
6 |
| -But when it comes time to deploy Elasticsearch to production, there are a few |
7 |
| -recommendations that you should consider. Nothing is a hard-and-fast rule; |
8 |
| -Elasticsearch is used for a wide range of tasks and on a bewildering array of |
9 |
| -machines. But these recommendations provide good starting points based on our experience with |
10 |
| -production clusters. |
11 |
| - |
12 |
| -==== Memory |
13 |
| - |
14 |
| -If there is one resource that you will run out of first, it will likely be memory.((("hardware", "memory")))((("memory"))) |
15 |
| -Sorting and aggregations can both be memory hungry, so enough heap space to |
16 |
| -accommodate these is important.((("heap"))) Even when the heap is comparatively small, |
17 |
| -extra memory can be given to the OS filesystem cache. Because many data structures |
18 |
| -used by Lucene are disk-based formats, Elasticsearch leverages the OS cache to |
19 |
| -great effect. |
20 |
| - |
21 |
| -A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines |
22 |
| -are also common. Less than 8 GB tends to be counterproductive (you end up |
23 |
| -needing many, many small machines), and greater than 64 GB has problems that we will |
24 |
| -discuss in <<heap-sizing>>. |
| 2 | +=== 硬件 |
| 3 | + |
| 4 | +按照正常的流程,你可能已经((("deployment", "hardware")))((("hardware")))在自己的笔记本电脑或集群上使用了 Elasticsearch。 |
| 5 | +但是当要部署 Elasticsearch 到生产环境时,有一些建议是你需要考虑的。这里没有什么必须要遵守的准则,Elasticsearch 被用于在众多的机器上处理各种任务。基于我们在生产环境使用 Elasticsearch 集群的经验,可以为你提供一个好的起点。 |
| 6 | + |
| 7 | +==== 内存 |
| 8 | + |
| 9 | +如果有一种资源是最先被耗尽的,它可能是内存。((("hardware", "memory")))((("memory"))) |
| 10 | +排序和聚合都很耗内存,所以有足够的堆空间来应付它们是很重要的。((("heap"))) 即使堆空间是比较小的时候, |
| 11 | +也能为操作系统文件缓存提供额外的内存。因为Lucene使用的许多数据结构是基于磁盘的格式,Elasticsearch 利用操作系统缓存能产生很大效果。 |
| 12 | + |
| 13 | +64 GB内存的机器是非常理想的, 但是32 GB 和 16 GB 机器也是很常见的。少于8 GB 会适得其反 (你最终需要很多很多的小机器), 大于64 GB的机器也会有问题, |
| 14 | +我们将在<<heap-sizing>>中讨论。 |
25 | 15 |
|
26 | 16 | ==== CPUs
|
27 | 17 |
|
28 |
| -Most Elasticsearch deployments tend to be rather light on CPU requirements. As |
29 |
| -such,((("CPUs (central processing units)")))((("hardware", "CPUs"))) the exact processor setup matters less than the other resources. You should |
30 |
| -choose a modern processor with multiple cores. Common clusters utilize two to eight |
31 |
| -core machines. |
| 18 | +大多数 Elasticsearch 部署往往对CPU要求很小。因此,((("CPUs (central processing units)")))((("hardware", "CPUs"))) |
| 19 | +确切的处理器安装事项少于其他资源。你应该选择具有多个内核的现代处理器,常见的集群使用两到八个核的机器。 |
32 | 20 |
|
33 |
| -If you need to choose between faster CPUs or more cores, choose more cores. The |
34 |
| -extra concurrency that multiple cores offers will far outweigh a slightly faster |
35 |
| -clock speed. |
| 21 | +如果你要在更快的CUPs和更多的核心之间选择,选择更多的核心更好。多个内核提供的额外并发将远远超过稍微快点的CPU速度。 |
36 | 22 |
|
37 |
| -==== Disks |
| 23 | +==== 硬盘 |
38 | 24 |
|
39 |
| -Disks are important for all clusters,((("disks")))((("hardware", "disks"))) and doubly so for indexing-heavy clusters |
40 |
| -(such as those that ingest log data). Disks are the slowest subsystem in a server, |
41 |
| -which means that write-heavy clusters can easily saturate their disks, which in |
42 |
| -turn become the bottleneck of the cluster. |
| 25 | +硬盘对所有的集群都很重要,((("disks")))((("hardware", "disks"))) 对高度索引的集群更是加倍重要 |
| 26 | +(例如那些存储日志数据的)。硬盘是服务器上最慢的子系统,这意味着那些写入量很大的集群很容易让硬盘饱和,,使得它成为集群的瓶颈。 |
43 | 27 |
|
44 |
| -If you can afford SSDs, they are by far superior to any spinning media. SSD-backed |
45 |
| -nodes see boosts in both query and indexing performance. If you can afford it, |
46 |
| -SSDs are the way to go. |
| 28 | +如果你负担得起SSD,它将远远超出任何旋转媒介(注:机械硬盘,磁带等)。 基于SSD 的节点,查询和索引性能都有提升。如果你负担得起,SSD是一个好的选择。 |
47 | 29 |
|
48 |
| -.Check Your I/O Scheduler |
49 |
| -**** |
50 |
| -If you are using SSDs, make sure your OS I/O scheduler is((("I/O scheduler"))) configured correctly. |
51 |
| -When you write data to disk, the I/O scheduler decides when that data is |
52 |
| -_actually_ sent to the disk. The default under most *nix distributions is a |
53 |
| -scheduler called `cfq` (Completely Fair Queuing). |
54 |
| -
|
55 |
| -This scheduler allocates _time slices_ to each process, and then optimizes the |
56 |
| -delivery of these various queues to the disk. It is optimized for spinning media: |
57 |
| -the nature of rotating platters means it is more efficient to write data to disk |
58 |
| -based on physical layout. |
59 |
| -
|
60 |
| -This is inefficient for SSD, however, since there are no spinning platters |
61 |
| -involved. Instead, `deadline` or `noop` should be used instead. The deadline |
62 |
| -scheduler optimizes based on how long writes have been pending, while `noop` |
63 |
| -is just a simple FIFO queue. |
64 |
| -
|
65 |
| -This simple change can have dramatic impacts. We've seen a 500-fold improvement |
66 |
| -to write throughput just by using the correct scheduler. |
| 30 | +.检查你的 I/O 调度程序 |
67 | 31 | ****
|
| 32 | +如果你正在使用SSDs,确保你的系统 I/O 调度程序是((("I/O scheduler"))) 配置正确的。 |
| 33 | +当你向硬盘写数据,I/O 调度程序决定何时把数据 |
| 34 | +_实际_ 发送到硬盘。大多数默认 *nix 发行版下的调度程序都叫做 `cfq` (完全公平队列). |
| 35 | +
|
| 36 | +调度程序分配 _时间片_ 到每个进程。并且优化这些到硬盘的众多队列的传递。但它是为旋转媒介优化的: |
| 37 | +旋转盘片的固有特性意味着它写入数据到基于物理布局的硬盘会更高效。 |
68 | 38 |
|
69 |
| -If you use spinning media, try to obtain the fastest disks possible (high-performance server disks, 15k RPM drives). |
| 39 | +这对SSD来说是低效的,然而,尽管这里没有涉及到旋转盘片。但是,`deadline` 或者 `noop` 应该被使用。`deadline`调度程序基于写入等待时间进行优化, |
| 40 | +`noop`只是一个简单的FIFO队列。 |
70 | 41 |
|
71 |
| -Using RAID 0 is an effective way to increase disk speed, for both spinning disks |
72 |
| -and SSD. There is no need to use mirroring or parity variants of RAID, since |
73 |
| -high availability is built into Elasticsearch via replicas. |
| 42 | +这个简单的更改可以带来显著的影响。仅仅是使用正确的调度程序,我们看到了500倍的写入能力提升。 |
| 43 | +**** |
74 | 44 |
|
75 |
| -Finally, avoid network-attached storage (NAS). People routinely claim their |
76 |
| -NAS solution is faster and more reliable than local drives. Despite these claims, |
77 |
| -we have never seen NAS live up to its hype. NAS is often slower, displays |
78 |
| -larger latencies with a wider deviation in average latency, and is a single |
79 |
| -point of failure. |
| 45 | +如果你使用旋转媒介,尝试获取尽可能快的硬盘 (高性能服务器硬盘, 15k RPM 驱动器). |
80 | 46 |
|
81 |
| -==== Network |
| 47 | +使用RAID 0是提高硬盘速度的有效途径, 对旋转硬盘和SSD来说都是如此。没有必要使用镜像或其它RAID变体,因为高可用已经通过replicas内建于Elasticsearch之中。 |
82 | 48 |
|
83 |
| -A fast and reliable network is obviously important to performance in a distributed((("hardware", "network")))((("network"))) |
84 |
| -system. Low latency helps ensure that nodes can communicate easily, while |
85 |
| -high bandwidth helps shard movement and recovery. Modern data-center networking |
86 |
| -(1 GbE, 10 GbE) is sufficient for the vast majority of clusters. |
| 49 | +最后,避免使用网络附加存储 (NAS)。人们常声称他们的NAS解决方案比本地驱动器更快更可靠。除却这些声称, |
| 50 | +我们从没看到NAS能配得上它的大肆宣传。NAS常常很慢,显露出更大的延时和更宽的平均延时方差,而且它是单点故障的。 |
87 | 51 |
|
88 |
| -Avoid clusters that span multiple data centers, even if the data centers are |
89 |
| -colocated in close proximity. Definitely avoid clusters that span large geographic |
90 |
| -distances. |
| 52 | +==== 网络 |
91 | 53 |
|
92 |
| -Elasticsearch clusters assume that all nodes are equal--not that half the nodes |
93 |
| -are actually 150ms distant in another data center. Larger latencies tend to |
94 |
| -exacerbate problems in distributed systems and make debugging and resolution |
95 |
| -more difficult. |
| 54 | +快速可靠的网络显然对分布式系统的性能是很重要的((("hardware", "network")))((("network")))。 |
| 55 | +低延时能帮助确保节点间能容易的通讯,大带宽能帮助分片移动和恢复。现代数据中心网络 |
| 56 | +(1 GbE, 10 GbE) 对绝大多数集群都是足够的。 |
96 | 57 |
|
97 |
| -Similar to the NAS argument, everyone claims that their pipe between data centers is |
98 |
| -robust and low latency. This is true--until it isn't (a network failure will |
99 |
| -happen eventually; you can count on it). From our experience, the hassle of |
100 |
| -managing cross–data center clusters is simply not worth the cost. |
| 58 | +即使数据中心们近在咫尺,也要避免集群跨越多个数据中心。绝对要避免集群跨越大的地理距离。 |
101 | 59 |
|
102 |
| -==== General Considerations |
| 60 | +Elasticsearch 假定所有节点都是平等的--并不会因为有一半的节点在150ms外的另一数据中心而有所不同。更大的延时会加重分布式系统中的问题而且使得调试和排错更困难。 |
103 | 61 |
|
104 |
| -It is possible nowadays to obtain truly enormous machines:((("hardware", "general considerations"))) hundreds of gigabytes |
105 |
| -of RAM with dozens of CPU cores. Conversely, it is also possible to spin up |
106 |
| -thousands of small virtual machines in cloud platforms such as EC2. Which |
107 |
| -approach is best? |
| 62 | +和NAS的争论类似,每个人都声称他们的数据中心间的线路都是健壮和低延时的。这是真的--直到它不是时(网络失败终究是会发生的,你可以相信它)。 |
| 63 | +从我们的经验来看,处理跨数据中心集群的麻烦事是根本不值得的。 |
108 | 64 |
|
109 |
| -In general, it is better to prefer medium-to-large boxes. Avoid small machines, |
110 |
| -because you don't want to manage a cluster with a thousand nodes, and the overhead |
111 |
| -of simply running Elasticsearch is more apparent on such small boxes. |
| 65 | +==== 一般注意事项 |
112 | 66 |
|
113 |
| -At the same time, avoid the truly enormous machines. They often lead to imbalanced |
114 |
| -resource usage (for example, all the memory is being used, but none of the CPU) and can |
115 |
| -add logistical complexity if you have to run multiple nodes per machine. |
| 67 | +获取真正的巨型机器在今天是可能的:((("hardware", "general considerations"))) 成百GB的RAM 和 几十个 CPU 核心。 |
| 68 | +反之,在云平台上串联起成千的小虚拟机也是可能的,例如 EC2。哪条道路是最好的? |
116 | 69 |
|
| 70 | +通常,选择中到大型机器更好。避免使用小型机器, |
| 71 | +因为你不会希望去管理拥有上千个节点的集群,而且在这些小型机器上 运行Elasticsearch的开销也是显著的。 |
117 | 72 |
|
| 73 | +与此同时,避免使用真正的巨型机器。它们通常会导致资源使用不均衡(例如,所有的内存都被使用,但CPU没有)而且在单机上运行多个节点时,会增加逻辑复杂度。 |
0 commit comments