Skip to content

Commit ecf72d8

Browse files
joyeecheungitaloacasas
authored andcommitted
benchmark: use "confidence" in output of compare.R
Use the word "confidence" to indicate the confidence level of the p value so it's easier to understand. With this change more stars in the output of compare.R means higher confidence level (lower significance level). PR-URL: #10737 Refs: #10439 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Andreas Madsen <[email protected]>
1 parent 8b02b4e commit ecf72d8

File tree

2 files changed

+10
-10
lines changed

2 files changed

+10
-10
lines changed

benchmark/README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ For analysing the benchmark results use the `compare.R` tool.
161161
```console
162162
$ cat compare-pr-5134.csv | Rscript benchmark/compare.R
163163

164-
improvement significant p.value
164+
improvement confidence p.value
165165
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=ascii 12.46 % *** 1.165345e-04
166166
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=base64-ascii 24.70 % *** 1.820615e-15
167167
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=base64-utf8 23.60 % *** 2.105625e-12
@@ -171,7 +171,7 @@ string_decoder/string-decoder.js n=250000 chunk=1024 inlen=128 encoding=ascii
171171
```
172172

173173
In the output, _improvement_ is the relative improvement of the new version,
174-
hopefully this is positive. _significant_ tells if there is enough
174+
hopefully this is positive. _confidence_ tells if there is enough
175175
statistical evidence to validate the _improvement_. If there is enough evidence
176176
then there will be at least one star (`*`), more stars is just better. **However
177177
if there are no stars, then you shouldn't make any conclusions based on the
@@ -189,7 +189,7 @@ may require more runs to obtain (can be set with `--runs`).
189189

190190
_For the statistically minded, the R script performs an [independent/unpaired
191191
2-group t-test][t-test], with the null hypothesis that the performance is the
192-
same for both versions. The significant field will show a star if the p-value
192+
same for both versions. The confidence field will show a star if the p-value
193193
is less than `0.05`._
194194

195195
The `compare.R` tool can also produce a box plot by using the `--plot filename`
@@ -202,7 +202,7 @@ keep the first line since that contains the header information.
202202
```console
203203
$ cat compare-pr-5134.csv | sed '1p;/encoding=ascii/!d' | Rscript benchmark/compare.R --plot compare-plot.png
204204

205-
improvement significant p.value
205+
improvement confidence p.value
206206
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=1024 encoding=ascii 12.46 % *** 1.165345e-04
207207
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=128 encoding=ascii 6.70 % * 2.928003e-02
208208
string_decoder/string-decoder.js n=250000 chunk=1024 inlen=32 encoding=ascii 7.47 % *** 5.780583e-04

benchmark/compare.R

+6-6
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ statistics = ddply(dat, "name", function(subdat) {
4646
improvement = sprintf("%.2f %%", ((new.mu - old.mu) / old.mu * 100));
4747

4848
p.value = NA;
49-
significant = 'NA';
49+
confidence = 'NA';
5050
# Check if there is enough data to calulate the calculate the p-value
5151
if (length(old.rate) > 1 && length(new.rate) > 1) {
5252
# Perform a statistics test to see of there actually is a difference in
@@ -56,19 +56,19 @@ statistics = ddply(dat, "name", function(subdat) {
5656

5757
# Add user friendly stars to the table. There should be at least one star
5858
# before you can say that there is an improvement.
59-
significant = '';
59+
confidence = '';
6060
if (p.value < 0.001) {
61-
significant = '***';
61+
confidence = '***';
6262
} else if (p.value < 0.01) {
63-
significant = '**';
63+
confidence = '**';
6464
} else if (p.value < 0.05) {
65-
significant = '*';
65+
confidence = '*';
6666
}
6767
}
6868

6969
r = list(
7070
improvement = improvement,
71-
significant = significant,
71+
confidence = confidence,
7272
p.value = p.value
7373
);
7474
return(data.frame(r));

0 commit comments

Comments
 (0)