Skip to content

Commit 6edef1d

Browse files
committed
benchmark: update docs after refactor
PR-URL: #7094 Reviewed-By: Trevor Norris <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Brian White <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
1 parent 0c0f34e commit 6edef1d

File tree

3 files changed

+240
-95
lines changed

3 files changed

+240
-95
lines changed

benchmark/README.md

+240-95
Original file line numberDiff line numberDiff line change
@@ -1,147 +1,292 @@
1-
# Node.js core benchmark tests
1+
# Node.js core benchmark
22

3-
This folder contains benchmark tests to measure the performance for certain
4-
Node.js APIs.
3+
This folder contains benchmarks to measure the performance of the Node.js APIs.
4+
5+
## Table of Content
6+
7+
* [Prerequisites](#prerequisites)
8+
* [Running benchmarks](#running-benchmarks)
9+
* [Running individual benchmarks](#running-individual-benchmarks)
10+
* [Running all benchmarks](#running-all-benchmarks)
11+
* [Comparing node versions](#comparing-node-versions)
12+
* [Comparing parameters](#comparing-parameters)
13+
* [Creating a benchmark](#creating-a-benchmark)
514

615
## Prerequisites
716

8-
Most of the http benchmarks require [`wrk`][wrk] and [`ab`][ab] (ApacheBench) being installed.
9-
These may be available through your preferred package manager.
17+
Most of the http benchmarks require [`wrk`][wrk] to be installed. It may be
18+
available through your preferred package manager. If not, `wrk` can be built
19+
[from source][wrk] via `make`.
1020

11-
If they are not available:
12-
- `wrk` may easily be built [from source][wrk] via `make`.
13-
- `ab` is sometimes bundled in a package called `apache2-utils`.
21+
To analyze the results `R` should be installed. Check you package manager or
22+
download it from https://www.r-project.org/.
23+
24+
The R packages `ggplot2` and `plyr` are also used and can be installed using
25+
the R REPL.
26+
27+
```R
28+
$ R
29+
install.packages("ggplot2")
30+
install.packages("plyr")
31+
```
1432

1533
[wrk]: https://github.com/wg/wrk
16-
[ab]: http://httpd.apache.org/docs/2.2/programs/ab.html
1734

18-
## How to run tests
35+
## Running benchmarks
1936

20-
There are three ways to run benchmark tests:
37+
### Running individual benchmarks
2138

22-
### Run all tests of a given type
39+
This can be useful for debugging a benchmark or doing a quick performance
40+
measure. But it does not provide the statistical information to make any
41+
conclusions about the performance.
2342

24-
For example, buffers:
43+
Individual benchmarks can be executed by simply executing the benchmark script
44+
with node.
2545

26-
```bash
27-
node benchmark/run.js buffers
2846
```
47+
$ node benchmark/buffers/buffer-tostring.js
2948
30-
The above command will find all scripts under `buffers` directory and require
31-
each of them as a module. When a test script is required, it creates an instance
32-
of `Benchmark` (a class defined in common.js). In the next tick, the `Benchmark`
33-
constructor iterates through the configuration object property values and runs
34-
the test function with each of the combined arguments in spawned processes. For
35-
example, buffers/buffer-read.js has the following configuration:
49+
buffers/buffer-tostring.js n=10000000 len=0 arg=true: 62710590.393305704
50+
buffers/buffer-tostring.js n=10000000 len=1 arg=true: 9178624.591787899
51+
buffers/buffer-tostring.js n=10000000 len=64 arg=true: 7658962.8891432695
52+
buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 4136904.4060201733
53+
buffers/buffer-tostring.js n=10000000 len=0 arg=false: 22974354.231509723
54+
buffers/buffer-tostring.js n=10000000 len=1 arg=false: 11485945.656765845
55+
buffers/buffer-tostring.js n=10000000 len=64 arg=false: 8718280.70650129
56+
buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 4103857.0726124765
57+
```
58+
59+
Each line represents a single benchmark with parameters specified as
60+
`${variable}=${value}`. Each configuration combination is executed in a separate
61+
process. This ensures that benchmark results aren't affected by the execution
62+
order due to v8 optimizations. **The last number is the rate of operations
63+
measured in ops/sec (higher is better).**
64+
65+
Furthermore you can specify a subset of the configurations, by setting them in
66+
the process arguments:
3667

37-
```js
38-
var bench = common.createBenchmark(main, {
39-
noAssert: [false, true],
40-
buffer: ['fast', 'slow'],
41-
type: ['UInt8', 'UInt16LE', 'UInt16BE',
42-
'UInt32LE', 'UInt32BE',
43-
'Int8', 'Int16LE', 'Int16BE',
44-
'Int32LE', 'Int32BE',
45-
'FloatLE', 'FloatBE',
46-
'DoubleLE', 'DoubleBE'],
47-
millions: [1]
48-
});
4968
```
50-
The runner takes one item from each of the property array value to build a list
51-
of arguments to run the main function. The main function will receive the conf
52-
object as follows:
69+
$ node benchmark/buffers/buffer-tostring.js len=1024
5370
54-
- first run:
55-
```js
56-
{ noAssert: false,
57-
buffer: 'fast',
58-
type: 'UInt8',
59-
millions: 1
60-
}
71+
buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 3498295.68561504
72+
buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 3783071.1678948295
6173
```
62-
- second run:
63-
```js
64-
{
65-
noAssert: false,
66-
buffer: 'fast',
67-
type: 'UInt16LE',
68-
millions: 1
69-
}
74+
75+
### Running all benchmarks
76+
77+
Similar to running individual benchmarks, a group of benchmarks can be executed
78+
by using the `run.js` tool. Again this does not provide the statistical
79+
information to make any conclusions.
80+
7081
```
82+
$ node benchmark/run.js arrays
83+
84+
arrays/var-int.js
85+
arrays/var-int.js n=25 type=Array: 71.90148040747789
86+
arrays/var-int.js n=25 type=Buffer: 92.89648382795582
7187
...
7288
73-
In this case, the main function will run 2*2*14*1 = 56 times. The console output
74-
looks like the following:
89+
arrays/zero-float.js
90+
arrays/zero-float.js n=25 type=Array: 75.46208316171496
91+
arrays/zero-float.js n=25 type=Buffer: 101.62785630273159
92+
...
7593
76-
```
77-
buffers//buffer-read.js
78-
buffers/buffer-read.js noAssert=false buffer=fast type=UInt8 millions=1: 271.83
79-
buffers/buffer-read.js noAssert=false buffer=fast type=UInt16LE millions=1: 239.43
80-
buffers/buffer-read.js noAssert=false buffer=fast type=UInt16BE millions=1: 244.57
94+
arrays/zero-int.js
95+
arrays/zero-int.js n=25 type=Array: 72.31023859816062
96+
arrays/zero-int.js n=25 type=Buffer: 90.49906662339653
8197
...
8298
```
8399

84-
The last number is the rate of operations. Higher is better.
100+
It is possible to execute more groups by adding extra process arguments.
101+
```
102+
$ node benchmark/run.js arrays buffers
103+
```
104+
105+
### Comparing node versions
106+
107+
To compare the effect of a new node version use the `compare.js` tool. This
108+
will run each benchmark multiple times, making it possible to calculate
109+
statistics on the performance measures.
110+
111+
As an example on how to check for a possible performance improvement, the
112+
[#5134](https://github.com/nodejs/node/pull/5134) pull request will be used as
113+
an example. This pull request _claims_ to improve the performance of the
114+
`string_decoder` module.
115+
116+
First build two versions of node, one from the master branch (here called
117+
`./node-master`) and another with the pull request applied (here called
118+
`./node-pr-5135`).
119+
120+
The `compare.js` tool will then produce a csv file with the benchmark results.
121+
122+
```
123+
$ node benchmark/compare.js --old ./node-master --new ./node-pr-5134 string_decoder > compare-pr-5134.csv
124+
```
85125

86-
### Run an individual test
126+
For analysing the benchmark results use the `compare.R` tool.
87127

88-
For example, buffer-slice.js:
128+
```
129+
$ cat compare-pr-5134.csv | Rscript benchmark/compare.R
89130
90-
```bash
91-
node benchmark/buffers/buffer-read.js
131+
improvement significant p.value
132+
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=ascii 12.46 % *** 1.165345e-04
133+
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=base64-ascii 24.70 % *** 1.820615e-15
134+
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=base64-utf8 23.60 % *** 2.105625e-12
135+
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=utf8 14.04 % *** 1.291105e-07
136+
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=128 encoding=ascii 6.70 % * 2.928003e-02
137+
...
92138
```
93-
The output:
139+
140+
In the output, _improvement_ is the relative improvement of the new version,
141+
hopefully this is positive. _significant_ tells if there is enough
142+
statistical evidence to validate the _improvement_. If there is enough evidence
143+
then there will be at least one star (`*`), more stars is just better. **However
144+
if there are no stars, then you shouldn't make any conclusions based on the
145+
_improvement_.** Sometimes this is fine, for example if you are expecting there
146+
to be no improvements, then there shouldn't be any stars.
147+
148+
**A word of caution:** Statistics is not a foolproof tool. If a benchmark shows
149+
a statistical significant difference, there is a 5% risk that this
150+
difference doesn't actually exists. For a single benchmark this is not an
151+
issue. But when considering 20 benchmarks it's normal that one of them
152+
will show significance, when it shouldn't. A possible solution is to instead
153+
consider at least two stars (`**`) as the threshold, in that case the risk
154+
is 1%. If three stars (`***`) is considered the risk is 0.1%. However this
155+
may require more runs to obtain (can be set with `--runs`).
156+
157+
_For the statistically minded, the R script performs an [independent/unpaired
158+
2-group t-test][t-test], with the null hypothesis that the performance is the
159+
same for both versions. The significant field will show a star if the p-value
160+
is less than `0.05`._
161+
162+
[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes.2C_unequal_variances
163+
164+
The `compare.R` tool can also produce a box plot by using the `--plot filename`
165+
option. In this case there are 48 different benchmark combinations, thus you
166+
may want to filter the csv file. This can be done while benchmarking using the
167+
`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results
168+
afterwards using tools such as `sed` or `grep`. In the `sed` case be sure to
169+
keep the first line since that contains the header information.
170+
94171
```
95-
buffers/buffer-read.js noAssert=false buffer=fast type=UInt8 millions=1: 246.79
96-
buffers/buffer-read.js noAssert=false buffer=fast type=UInt16LE millions=1: 240.11
97-
buffers/buffer-read.js noAssert=false buffer=fast type=UInt16BE millions=1: 245.91
172+
$ cat compare-pr-5134.csv | sed '1p;/encoding=ascii/!d' | Rscript benchmark/compare.R --plot compare-plot.png
173+
174+
improvement significant p.value
175+
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=ascii 12.46 % *** 1.165345e-04
176+
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=128 encoding=ascii 6.70 % * 2.928003e-02
177+
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=32 encoding=ascii 7.47 % *** 5.780583e-04
178+
string_decoder/string-decoder.js n=250000 chunk=16 inlen=1024 encoding=ascii 8.94 % *** 1.788579e-04
179+
string_decoder/string-decoder.js n=250000 chunk=16 inlen=128 encoding=ascii 10.54 % *** 4.016172e-05
98180
...
99181
```
100182

101-
### Run tests with options
183+
![compare tool boxplot](doc_img/compare-boxplot.png)
184+
185+
### Comparing parameters
186+
187+
It can be useful to compare the performance for different parameters, for
188+
example to analyze the time complexity.
189+
190+
To do this use the `scatter.js` tool, this will run a benchmark multiple times
191+
and generate a csv with the results.
192+
193+
```
194+
$ node benchmark/scatter.js benchmark/string_decoder/string-decoder.js > scatter.csv
195+
```
196+
197+
After generating the csv, a comparison table can be created using the
198+
`scatter.R` tool. Even more useful it creates an actual scatter plot when using
199+
the `--plot filename` option.
102200

103-
This example will run only the first type of url test, with one iteration.
104-
(Note: benchmarks require __many__ iterations to be statistically accurate.)
201+
```
202+
$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunk --category encoding --plot scatter-plot.png --log
105203
204+
aggregating variable: inlen
106205
107-
```bash
108-
node benchmark/url/url-parse.js type=one n=1
206+
chunk encoding mean confidence.interval
207+
16 ascii 1111933.3 221502.48
208+
16 base64-ascii 167508.4 33116.09
209+
16 base64-utf8 122666.6 25037.65
210+
16 utf8 783254.8 159601.79
211+
64 ascii 2623462.9 399791.36
212+
64 base64-ascii 462008.3 85369.45
213+
64 base64-utf8 420108.4 85612.05
214+
64 utf8 1358327.5 235152.03
215+
256 ascii 3730343.4 371530.47
216+
256 base64-ascii 663281.2 80302.73
217+
256 base64-utf8 632911.7 81393.07
218+
256 utf8 1554216.9 236066.53
219+
1024 ascii 4399282.0 186436.46
220+
1024 base64-ascii 730426.6 63806.12
221+
1024 base64-utf8 680954.3 68076.33
222+
1024 utf8 1554832.5 237532.07
109223
```
110-
Output:
224+
225+
Because the scatter plot can only show two variables (in this case _chunk_ and
226+
_encoding_) the rest is aggregated. Sometimes aggregating is a problem, this
227+
can be solved by filtering. This can be done while benchmarking using the
228+
`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results
229+
afterwards using tools such as `sed` or `grep`. In the `sed` case be
230+
sure to keep the first line since that contains the header information.
231+
111232
```
112-
url/url-parse.js type=one n=1: 1663.74402
233+
$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunk --category encoding --plot scatter-plot.png --log
234+
235+
chunk encoding mean confidence.interval
236+
16 ascii 701285.96 21233.982
237+
16 base64-ascii 107719.07 3339.439
238+
16 base64-utf8 72966.95 2438.448
239+
16 utf8 475340.84 17685.450
240+
64 ascii 2554105.08 87067.132
241+
64 base64-ascii 330120.32 8551.707
242+
64 base64-utf8 249693.19 8990.493
243+
64 utf8 1128671.90 48433.862
244+
256 ascii 4841070.04 181620.768
245+
256 base64-ascii 849545.53 29931.656
246+
256 base64-utf8 809629.89 33773.496
247+
256 utf8 1489525.15 49616.334
248+
1024 ascii 4931512.12 165402.805
249+
1024 base64-ascii 863933.22 27766.982
250+
1024 base64-utf8 827093.97 24376.522
251+
1024 utf8 1487176.43 50128.721
113252
```
114253

115-
## How to write a benchmark test
254+
![compare tool boxplot](doc_img/scatter-plot.png)
116255

117-
The benchmark tests are grouped by types. Each type corresponds to a subdirectory,
118-
such as `arrays`, `buffers`, or `fs`.
256+
## Creating a benchmark
119257

120-
Let's add a benchmark test for Buffer.slice function. We first create a file
121-
buffers/buffer-slice.js.
258+
All benchmarks use the `require('../common.js')` module. This contains the
259+
`createBenchmark(main, configs)` method which will setup your benchmark.
122260

123-
### The code snippet
261+
The first argument `main` is the benchmark function, the second argument
262+
specifies the benchmark parameters. `createBenchmark` will run all possible
263+
combinations of these parameters, unless specified otherwise. Note that the
264+
configuration values can only be strings or numbers.
124265

125-
```js
126-
var common = require('../common.js'); // Load the test runner
266+
`createBenchmark` also creates a `bench` object, which is used for timing
267+
the runtime of the benchmark. Run `bench.start()` after the initialization
268+
and `bench.end(n)` when the benchmark is done. `n` is the number of operations
269+
you performed in the benchmark.
127270

128-
var SlowBuffer = require('buffer').SlowBuffer;
271+
```js
272+
'use strict';
273+
const common = require('../common.js');
274+
const SlowBuffer = require('buffer').SlowBuffer;
129275

130-
// Create a benchmark test for function `main` and the configuration variants
131-
var bench = common.createBenchmark(main, {
132-
type: ['fast', 'slow'], // Two types of buffer
133-
n: [512] // Number of times (each unit is 1024) to call the slice API
276+
const bench = common.createBenchmark(main, {
277+
n: [1024],
278+
type: ['fast', 'slow'],
279+
size: [16, 128, 1024]
134280
});
135281

136282
function main(conf) {
137-
// Read the parameters from the configuration
138-
var n = +conf.n;
139-
var b = conf.type === 'fast' ? buf : slowBuf;
140-
bench.start(); // Start benchmarking
141-
for (var i = 0; i < n * 1024; i++) {
142-
// Add your test here
143-
b.slice(10, 256);
283+
bench.start();
284+
285+
const BufferConstructor = conf.type === 'fast' ? Buffer : SlowBuffer;
286+
287+
for (let i = 0; i < conf.n; i++) {
288+
new BufferConstructor(conf.size);
144289
}
145-
bench.end(n); // End benchmarking
290+
bench.end(conf.n);
146291
}
147292
```

benchmark/doc_img/compare-boxplot.png

260 KB
Loading

benchmark/doc_img/scatter-plot.png

178 KB
Loading

0 commit comments

Comments
 (0)