Skip to content

Commit e72cd41

Browse files
committed
cluster: use round-robin load balancing
Empirical evidence suggests that OS-level load balancing (that is, having multiple processes listen on a socket and have the operating system wake up one when a connection comes in) produces skewed load distributions on Linux, Solaris and possibly other operating systems. The observed behavior is that a fraction of the listening processes receive the majority of the connections. From the perspective of the operating system, that somewhat makes sense: a task switch is expensive, to be avoided whenever possible. That's why the operating system likes to give preferential treatment to a few processes, because it reduces the number of switches. However, that rather subverts the purpose of the cluster module, which is to distribute the load as evenly as possible. That's why this commit adds (and defaults to) round-robin support, meaning that the master process accepts connections and distributes them to the workers in a round-robin fashion, effectively bypassing the operating system. Round-robin is currently disabled on Windows due to how IOCP is wired up. It works and you can select it manually but it probably results in a heavy performance hit. Fixes #4435.
1 parent bdc5881 commit e72cd41

File tree

2 files changed

+313
-51
lines changed

2 files changed

+313
-51
lines changed

doc/api/cluster.markdown

+40-13
Original file line numberDiff line numberDiff line change
@@ -53,14 +53,28 @@ The worker processes are spawned using the `child_process.fork` method,
5353
so that they can communicate with the parent via IPC and pass server
5454
handles back and forth.
5555

56-
When you call `server.listen(...)` in a worker, it serializes the
57-
arguments and passes the request to the master process. If the master
58-
process already has a listening server matching the worker's
59-
requirements, then it passes the handle to the worker. If it does not
60-
already have a listening server matching that requirement, then it will
61-
create one, and pass the handle to the child.
56+
The cluster module supports two methods of distributing incoming
57+
connections.
58+
59+
The first one (and the default one on all platforms except Windows),
60+
is the round-robin approach, where the master process listens on a
61+
port, accepts new connections and distributes them across the workers
62+
in a round-robin fashion, with some built-in smarts to avoid
63+
overloading a worker process.
64+
65+
The second approach is where the master process creates the listen
66+
socket and sends it to interested workers. The workers then accept
67+
incoming connections directly.
68+
69+
The second approach should, in theory, give the best performance.
70+
In practice however, distribution tends to be very unbalanced due
71+
to operating system scheduler vagaries. Loads have been observed
72+
where over 70% of all connections ended up in just two processes,
73+
out of a total of eight.
6274

63-
This causes potentially surprising behavior in three edge cases:
75+
Because `server.listen()` hands off most of the work to the master
76+
process, there are three cases where the behavior between a normal
77+
node.js process and a cluster worker differs:
6478

6579
1. `server.listen({fd: 7})` Because the message is passed to the master,
6680
file descriptor 7 **in the parent** will be listened on, and the
@@ -77,12 +91,10 @@ This causes potentially surprising behavior in three edge cases:
7791
want to listen on a unique port, generate a port number based on the
7892
cluster worker ID.
7993

80-
When multiple processes are all `accept()`ing on the same underlying
81-
resource, the operating system load-balances across them very
82-
efficiently. There is no routing logic in Node.js, or in your program,
83-
and no shared state between the workers. Therefore, it is important to
84-
design your program such that it does not rely too heavily on in-memory
85-
data objects for things like sessions and login.
94+
There is no routing logic in Node.js, or in your program, and no shared
95+
state between the workers. Therefore, it is important to design your
96+
program such that it does not rely too heavily on in-memory data objects
97+
for things like sessions and login.
8698

8799
Because workers are all separate processes, they can be killed or
88100
re-spawned depending on your program's needs, without affecting other
@@ -91,6 +103,21 @@ continue to accept connections. Node does not automatically manage the
91103
number of workers for you, however. It is your responsibility to manage
92104
the worker pool for your application's needs.
93105

106+
## cluster.schedulingPolicy
107+
108+
The scheduling policy, either `cluster.SCHED_RR` for round-robin or
109+
`cluster.SCHED_NONE` to leave it to the operating system. This is a
110+
global setting and effectively frozen once you spawn the first worker
111+
or call `cluster.setupMaster()`, whatever comes first.
112+
113+
`SCHED_RR` is the default on all operating systems except Windows.
114+
Windows will change to `SCHED_RR` once libuv is able to effectively
115+
distribute IOCP handles without incurring a large performance hit.
116+
117+
`cluster.schedulingPolicy` can also be set through the
118+
`NODE_CLUSTER_SCHED_POLICY` environment variable. Valid
119+
values are `"rr"` and `"none"`.
120+
94121
## cluster.settings
95122

96123
* {Object}

0 commit comments

Comments
 (0)