Skip to content

Commit 05de5d6

Browse files
committed
Change the primary CGU merging algorithm.
Instead of repeatedly merging the two smallest CGUs, we now use a merging algorithm that aims to minimize the duplication of inlined functions. `exa-0.10.1` was one benchmark that saw particularly good results. The old CGU stats: ``` INTERNALIZE - unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined) - placed items: 3834 (1216 root + 2618 inlined), placed size: 154552 (77219 root + 77333 inlined) - placed/unique items ratio: 1.38, placed/unique size ratio: 1.27 - CGUs: 16, mean size: 9659.5, sizes: [11791, 11634, 11173, 10987, 10939, 10507, 9992, 9813, 9593, 9580, 9030, 8447, 7975, 7961, 7876, 7254] ``` The new CGU stats: ``` INTERNALIZE - unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined) - placed items: 3626 (1216 root + 2410 inlined), placed size: 147201 (77219 root + 69982 inlined) - placed/unique items ratio: 1.31, placed/unique size ratio: 1.21 - CGUs: 16, mean size: 9200.1, sizes: [11634, 10939, 10227, 9555, 9178, 9167, 8879, 8804, 8604, 8603 (x3), 8602 (x2), 8601, 8600] ``` The difference is in the number of inlined items. There are 1558 unique inlined items. With the old algorithm these were placed 2618 times, resulting in 1060 duplicates. With the new algorithm these were placed 2410 times, resulting in 852 duplicates. Also, the mean CGU size dropped from 9659.5 to 9200.1, and the CGU size distribution tightened, with the biggest one a little smaller and the smallest ones a little bigger.
1 parent 017c0b5 commit 05de5d6

File tree

1 file changed

+66
-14
lines changed

1 file changed

+66
-14
lines changed

compiler/rustc_monomorphize/src/partitioning.rs

+66-14
Original file line numberDiff line numberDiff line change
@@ -318,25 +318,58 @@ fn merge_codegen_units<'tcx>(
318318
let mut cgu_contents: FxHashMap<Symbol, Vec<Symbol>> =
319319
codegen_units.iter().map(|cgu| (cgu.name(), vec![cgu.name()])).collect();
320320

321-
// Repeatedly merge the two smallest codegen units as long as we have more
322-
// CGUs than the upper limit.
323-
while codegen_units.len() > cx.tcx.sess.codegen_units().as_usize() {
324-
// Sort small cgus to the back.
321+
// If N is the maximum number of CGUs, and the CGUs are sorted from largest
322+
// to smallest, we repeatedly find which CGU in codegen_units[N..] has the
323+
// greatest overlap of inlined items with codegen_units[N-1], merge that
324+
// CGU into codegen_units[N-1], then re-sort by size and repeat.
325+
//
326+
// We use inlined item overlap to guide this merging because it minimizes
327+
// duplication of inlined items, which makes LLVM be faster and generate
328+
// better and smaller machine code.
329+
//
330+
// Why merge into codegen_units[N-1]? We want CGUs to have similar sizes,
331+
// which means we don't want codegen_units[0..N] (the already big ones)
332+
// getting any bigger, if we can avoid it. When we have more than N CGUs
333+
// then at least one of the biggest N will have to grow. codegen_units[N-1]
334+
// is the smallest of those, and so has the most room to grow.
335+
let max_codegen_units = cx.tcx.sess.codegen_units().as_usize();
336+
while codegen_units.len() > max_codegen_units {
337+
// Sort small CGUs to the back.
325338
codegen_units.sort_by_key(|cgu| cmp::Reverse(cgu.size_estimate()));
326339

327-
let mut smallest = codegen_units.pop().unwrap();
328-
let second_smallest = codegen_units.last_mut().unwrap();
340+
let cgu_dst = &codegen_units[max_codegen_units - 1];
341+
342+
// Find the CGU that overlaps the most with `cgu_dst`. In the case of a
343+
// tie, favour the earlier (bigger) CGU.
344+
let mut max_overlap = 0;
345+
let mut max_overlap_i = max_codegen_units;
346+
for (i, cgu_src) in codegen_units.iter().enumerate().skip(max_codegen_units) {
347+
if cgu_src.size_estimate() <= max_overlap {
348+
// None of the remaining overlaps can exceed `max_overlap`, so
349+
// stop looking.
350+
break;
351+
}
329352

330-
// Move the items from `smallest` to `second_smallest`. Some of them
331-
// may be duplicate inlined items, in which case the destination CGU is
353+
let overlap = compute_inlined_overlap(cgu_dst, cgu_src);
354+
if overlap > max_overlap {
355+
max_overlap = overlap;
356+
max_overlap_i = i;
357+
}
358+
}
359+
360+
let mut cgu_src = codegen_units.swap_remove(max_overlap_i);
361+
let cgu_dst = &mut codegen_units[max_codegen_units - 1];
362+
363+
// Move the items from `cgu_src` to `cgu_dst`. Some of them may be
364+
// duplicate inlined items, in which case the destination CGU is
332365
// unaffected. Recalculate size estimates afterwards.
333-
second_smallest.items_mut().extend(smallest.items_mut().drain());
334-
second_smallest.compute_size_estimate();
366+
cgu_dst.items_mut().extend(cgu_src.items_mut().drain());
367+
cgu_dst.compute_size_estimate();
335368

336-
// Record that `second_smallest` now contains all the stuff that was
337-
// in `smallest` before.
338-
let mut consumed_cgu_names = cgu_contents.remove(&smallest.name()).unwrap();
339-
cgu_contents.get_mut(&second_smallest.name()).unwrap().append(&mut consumed_cgu_names);
369+
// Record that `cgu_dst` now contains all the stuff that was in
370+
// `cgu_src` before.
371+
let mut consumed_cgu_names = cgu_contents.remove(&cgu_src.name()).unwrap();
372+
cgu_contents.get_mut(&cgu_dst.name()).unwrap().append(&mut consumed_cgu_names);
340373
}
341374

342375
// Having multiple CGUs can drastically speed up compilation. But for
@@ -451,6 +484,25 @@ fn merge_codegen_units<'tcx>(
451484
}
452485
}
453486

487+
/// Compute the combined size of all inlined items that appear in both `cgu1`
488+
/// and `cgu2`.
489+
fn compute_inlined_overlap<'tcx>(cgu1: &CodegenUnit<'tcx>, cgu2: &CodegenUnit<'tcx>) -> usize {
490+
// Either order works. We pick the one that involves iterating over fewer
491+
// items.
492+
let (src_cgu, dst_cgu) =
493+
if cgu1.items().len() <= cgu2.items().len() { (cgu1, cgu2) } else { (cgu2, cgu1) };
494+
495+
let mut overlap = 0;
496+
for (item, data) in src_cgu.items().iter() {
497+
if data.inlined {
498+
if dst_cgu.items().contains_key(item) {
499+
overlap += data.size_estimate;
500+
}
501+
}
502+
}
503+
overlap
504+
}
505+
454506
fn internalize_symbols<'tcx>(
455507
cx: &PartitioningCx<'_, 'tcx>,
456508
codegen_units: &mut [CodegenUnit<'tcx>],

0 commit comments

Comments
 (0)