Skip to content

Commit 077870c

Browse files
authored
Merge pull request #1607 from Kobzol/document-db
Update database and glossary documentation
2 parents 14e9091 + 3d13993 commit 077870c

File tree

2 files changed

+221
-77
lines changed

2 files changed

+221
-77
lines changed

database/schema.md

Lines changed: 187 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,19 @@
22

33
Below is an explanation of the current database schema. This schema is duplicated across the (currently) two database backends we support: sqlite and postgres.
44

5-
65
## Overview
76

87
In general, the database is used to track three groups of things:
9-
* Performance run statistics (e.g., instruction count) on a per benchmark, profile, and cache-state basis.
8+
* Performance run statistics (e.g., instruction count) for compile time benchmarks on a per benchmark, profile, and scenario basis.
9+
* Performance run statistics (e.g., instruction count) for runtime benchmarks on a per benchmark basis.
1010
* Self profile data gathered with `-Zself-profile`.
11-
* State when running GitHub bots and the performance runs (e.g., how long it took for a performance suite to run, errors encountered a long the way, etc.)
11+
* State when running GitHub bots and the performance runs (e.g., how long it took for a performance suite to run, errors encountered along the way, etc.)
1212

1313
Below are some diagrams showing the basic layout of the database schema for these three uses:
1414

1515
### Performance run statistics
1616

17+
Here is the diagram for compile-time benchmarks:
1718
```
1819
┌────────────┐ ┌───────────────┐ ┌────────────┐
1920
│ benchmark │ │ collection │ │ artifact │
@@ -36,132 +37,256 @@ Below are some diagrams showing the basic layout of the database schema for thes
3637
└───────────────┘ └──────────┘
3738
```
3839

39-
### Self profile data
40-
41-
**TODO**
42-
43-
### Miscellaneous State
44-
45-
**TODO**
40+
For runtime benchmarks the schema very similar, but there are different table names:
41+
- `benchmark` => `runtime_benchmark`
42+
- `pstat` => `runtime_pstat`
43+
- `pstat_series` => `runtime_pstat_series`
44+
- There are different attributes here, `benchmark` and `metric`.
4645

4746
## Tables
4847

49-
### benchmark
50-
51-
The different types of benchmarks that are run.
52-
53-
The table stores the name of the benchmark as well as whether it is capable of being run using the stable compiler. The benchmark name is used as a foreign key in many of the other tables.
54-
55-
```
56-
sqlite> select * from benchmark limit 1;
57-
name stabilized
58-
---------- ----------
59-
helloworld 0
60-
```
61-
6248
### artifact
6349

64-
A description of a rustc compiler artifact being benchmarked.
50+
A description of a rustc compiler artifact being benchmarked.
6551

6652
This description includes:
6753
* name: usually a commit sha or a tag like "1.51.0" but is free-form text so can be anything.
68-
* date: the date associated with this compiler artifact (usually only when the name is a commit)
54+
* date: the date associated with this compiler artifact (usually only when the name is a commit)
6955
* type: currently one of "master" (i.e., we're testing a merge commit), "try" (someone is testing a PR), and "release" (usually a release candidate - though local compilers also get labeled like this).
7056

7157
```
7258
sqlite> select * from artifact limit 1;
73-
id name date type
74-
---------- ---------- ---------- ----------
75-
1 LOCAL_TEST release
59+
id name date type
60+
---------- ---------- ---------- -------
61+
1 LOCAL_TEST release
7662
```
7763

7864
### collection
7965

8066
A "collection" of benchmarks tied only differing by the statistic collected.
8167

82-
This is a way to collect statistics together signifying that they belong to the same logical benchmark run.
68+
This corresponds to a [`test result`](../docs/glossary.md#testing).
8369

84-
Currently the collection also marks the git sha of the currently running collector binary.
70+
This is a way to collect statistics together signifying that they belong to the same logical benchmark run.
71+
72+
Currently, the collection also marks the git sha of the currently running collector binary.
8573

8674
```
8775
sqlite> select * from collection limit 1;
88-
id perf_commit
89-
---------- -----------------------------------------
76+
id perf_commit
77+
---------- ----------------------------------------
9078
1 d9fd96f409a15429757030f225b082744a72516c
9179
```
9280

81+
### collector_progress
82+
83+
Keeps track of the collector's start and finish time as well as which step it's currently on.
84+
85+
```
86+
sqlite> select * from collector_progress limit 1;
87+
aid step start end
88+
---------- ---------- ---------- ----------
89+
1 helloworld 1625829961 1625829965
90+
```
91+
92+
### artifact_collection_duration
93+
94+
Records how long benchmarking takes in seconds.
95+
96+
```
97+
sqlite> select * from artifact_collection_duration limit 1;
98+
aid date_recorded duration
99+
---------- ------------- ----------
100+
1 1625829965 4
101+
```
102+
103+
### benchmark
104+
105+
The different types of compile-time benchmarks that are run.
106+
107+
The table stores the name of the benchmark, whether it is capable of being run using the stable compiler,
108+
and its category. The benchmark name is used as a foreign key in many of the other tables.
109+
110+
Category is either `primary` (real-world benchmark) or `secondary` (stress test).
111+
Stable benchmarks have `category` set to `primary` and `stabilized` set to `1`.
112+
113+
```
114+
sqlite> select * from runtime_benchmark limit 1;
115+
name stabilized category
116+
---------- ---------- ----------
117+
helloworld 0 primary
118+
```
119+
93120
### pstat_series
94121

95-
A unique collection of crate, profile, cache and statistic.
122+
Describes the parametrization of a compile-time benchmark. Contains a unique combination
123+
of a crate, profile, scenario and the metric being collected.
96124

97-
* crate: the benchmarked crate which might be a crate from crates.io or a crate made specifically to stress some part of the compiler.
125+
* crate (aka `benchmark`): the benchmarked crate which might be a crate from crates.io or a crate made specifically to stress some part of the compiler.
98126
* profile: what type of compilation is happening - check build, optimized build (a.k.a. release build), debug build, or doc build.
99-
* cache: how much of the incremental cache is full. An empty incremental cache means that the compiler must do a full build.
100-
* statistic: the type of stat being collected
127+
* cache (aka `scenario`): describes how much of the incremental cache is full. An empty incremental cache means that the compiler must do a full build.
128+
* statistic (aka `metric`): the type of metric being collected
129+
130+
This corresponds to a [`statistic description`](../docs/glossary.md).
131+
132+
There is a separate table for this collection to avoid duplicating crates, prfiles, scenarios etc.
133+
many times in the `pstat` table.
101134

102135
```
103136
sqlite> select * from pstat_series limit 1;
104-
id crate profile cache statistic
137+
id crate profile cache statistic
105138
---------- ---------- ---------- ---------- ------------
106139
1 helloworld check full task-clock:u
107140
```
108141

109142
### pstat
110143

111-
A statistic that is unique to a pstat_series, artifact and collection.
144+
A measured value of a compile-time metric that is unique to a `pstat_series`, `artifact` and a `collection`.
112145

113-
This stat is unique across a benchmarked crate, profile, cache state, statistic, rustc artifact, and benchmarks "collection".
146+
Each measured combination of a collection, rustc artifact, benchmarked crate, profile, scenario and a metric
147+
has its own unique entry in this table.
114148

115149
```
116150
sqlite> select * from pstat limit 1;
117-
series aid cid value
151+
series aid cid value
118152
---------- ---------- ---------- ----------
119-
1 1 1 24.93
153+
1 1 1 24.93
120154
```
121155

156+
### runtime_benchmark
122157

123-
### self_profile_query_series
158+
The different types of runtime benchmarks that are run.
124159

125-
**TODO**
160+
The table currently stores only the name of the benchmark.
126161

127-
### self_profile_query
162+
```
163+
sqlite> select * from runtime_benchmark limit 1;
164+
name
165+
---------
166+
nbody-10k
167+
```
128168

129-
**TODO**
169+
### runtime_pstat_series
130170

131-
### pull_request_build
171+
Describes the parametrization of a runtime benchmark. Contains a unique combination
172+
of a benchmark and the metric being collected.
132173

133-
**TODO**
174+
This table exists to avoid duplicating crates, profiles, scenarios etc. many times in the `runtime_pstat` table.
134175

135-
### artifact_collection_duration
176+
```
177+
sqlite> select * from runtime_pstat_series limit 1;
178+
id benchmark metric
179+
---------- --------- --------------
180+
1 nbody-10k instructions:u
181+
```
136182

137-
Records how long benchmarking takes in seconds.
183+
### runtime_pstat
184+
185+
A measured value of a runtime metric that is unique to a `runtime_pstat_series`, `artifact` and a `collection`.
186+
187+
Each measured combination of a collection, rustc artifact, benchmark and a metric
188+
has its own unique entry in this table.
138189

139190
```
140-
sqlite> select * from artifact_collection_duration limit 1;
141-
aid date_recorded duration
142-
---------- ------------- ----------
143-
1 1625829965 4
191+
sqlite> select * from runtime_pstat limit 1;
192+
series aid cid value
193+
---------- ---------- ---------- ----------
194+
1 1 1 24.93
144195
```
145196

146-
### collector_progress
197+
### self_profile_query_series
147198

148-
Keeps track of the collector's start and finish time as well as which step it's currently on.
199+
Describes a parametrization of a self-profile query. Contains a unique combination
200+
of a benchmark, profile, scenario and a `rustc` self-profile query.
201+
202+
This table exists to avoid duplicating benchmarks, profiles, scenarios etc. many times in the `self_profile_query` table.
149203

150204
```
151-
sqlite> select * from collector_progress limit 1;
152-
aid step start end
153-
---------- ---------- ---------- ----------
154-
1 helloworld 1625829961 1625829965
205+
sqlite> select * from runtime_pstat limit 1;
206+
id crate profile cache query
207+
-- ----- ------- ---------- -----
208+
1 hello-world debug full hir_crate
209+
```
210+
211+
### self_profile_query
212+
213+
A measured value of a single `rustc` self-profile query that is unique to a `self_profile_query_series`, `artifact` and a `collection`.
214+
215+
```
216+
sqlite> select * from runtime_pstat limit 1;
217+
series aid cid self_time blocked_time incremental_load_time number_of_cache_hits invocation_count
218+
-- ----- --- --------- ------------ --------------------- -------------------- ----------------
219+
1 42 58 11.8 10.2 8.4 224 408
155220
```
156221

157222
### rustc_compilation
158223

159-
**TODO**
224+
Records the duration of compiling a `rustc` crate for a given artifact and collection.
225+
226+
```
227+
sqlite> select * from runtime_pstat limit 1;
228+
aid cid crate duration
229+
--- --- ---------- --------
230+
1 42 rustc_mir_transform 28.096
231+
```
232+
233+
### raw_self_profile
234+
235+
Records that a given combination of artifact, collection, benchmark, profile and scenario
236+
has a self profile archive available. This profile is then downloaded through an endpoint -
237+
it is not stored in the database directly.
238+
239+
```
240+
sqlite> select * from raw_self_profile limit 1;
241+
aid cid crate profile cache
242+
--- --- ----- ------- -----
243+
1 42 hello-world debug full
244+
```
245+
246+
### pull_request_build
247+
248+
Records a pull request commit that is waiting in a queue to be benchmarked.
249+
250+
First a merge commit is queued, then its artifacts are built by bors, and once the commit
251+
is attached to the entry in this table, it can be benchmarked.
252+
253+
* bors_sha: SHA of the commit that should be benchmarked
254+
* pr: number of the PR
255+
* parent_sha: SHA of the parent commit, to which will the PR be compared
256+
* complete: bool specifying whether this commit has been already benchmarked or not
257+
* requested: when was the commit queued
258+
* include: which benchmarks should be included (corresponds to the `--include` benchmark parameter)
259+
* exclude: which benchmarks should be excluded (corresponds to the `--exclude` benchmark parameter)
260+
* runs: how many iterations should be used by default for the benchmark run
261+
* commit_date: when was the commit created
262+
263+
```
264+
sqlite> select * from pull_request_build limit 1;
265+
bors_sha pr parent_sha complete requested include exclude runs commit_date
266+
---------- -- ---------- -------- --------- ------- ------- ---- -----------
267+
1w0p83... 42 fq24xq... true <timestamp> 3 <timestamp>
268+
```
160269

161270
### error_series
162271

163-
**TODO**
272+
Records a compile-time benchmark that caused an error.
273+
274+
This table exists to avoid duplicating benchmarks many times in the `error` table.
275+
276+
```
277+
sqlite> select * from error_series limit 1;
278+
id crate
279+
---------- -----------
280+
1 hello-world
281+
```
164282

165283
### error
166284

167-
**TODO**
285+
Records a compilation error for an artifact and an entry in `error_series`.
286+
287+
```
288+
sqlite> select * from error limit 1;
289+
series aid error
290+
---------- --- -----
291+
1 42 Failed to compile...
292+
```

0 commit comments

Comments
 (0)