[AP] Tuned the AP Flow #2961

AlexandreSinger · 2025-04-02T12:42:18Z

The AP flow has many tunable knobs which trade-off quality and run time. Went through each of the knobs to find a good combination.

Updates to the partial legalizer:

Reversed the order that unplaced large blocks are inserted into partitions.
Increased the bin cluster gap from 1 to 2

On the largest VTR benchmarks, this decreased the number of overfilled bins after legalization by 15% and the average overfill of each of those bins by 40%.
On Titan, the number of overfilled bins decreased by 32% and the average overfill decreased by 2.5%.

Updates to the analytical solver and global placer:

Allowed the B2B solver to stop early if it seems to be converging.
Changed the anchor weights from a linearized term to a quadratic term.
Decreased the distance epsilon from 0.5 to 0.01.
Increased the max number of B2B solver iterations from 6 to 24
Decreased the CG iteration cap from 200 to 150.
The global placer saves the best legalized placement it has seen and returns it as its final result.

On the largest VTR benchmarks, this decreased the post GP HPWL by 22% and decreased the GP run time by 17%.
On Titan, the post GP HPWL decreased by 25%, and the GP run time decreased by 19%.

Updates to APPack:

Decreased the max candidate distance from 0.5 (W + H) to 0.1 (W + H) for logical blocks.
Decreased the max candidate distance for all other blocks to 0.35 (W + H)
Lowered the attenuation distance threshold from 2.0 to 1.75.
Decreased the attenuation value at the distance threshold to 0.35.
Increased the max unrelated clustering distance from 1 to 5.
Increased the max number of unrelated clustering attempts from 2 to 10.
Turned off all APPack optimization for RAM blocks.

On the largest VTR benchmarks, this decreased the wirelength by 2% over the un-tuned AP flow, with a 2.8% decreased pack time. On Titan, the post FL wirelength decreased by 6% and the post routing wirelength decreased by 2.6%, with a 0.7% decrease in pack time.

Updates to initial placement:

Fixed oversight with how the centroid was being calculated.
Increased the range limit when searching for nearby locations when the location a cluster wants is take from 15 to 60.

This further improved the post routing wirelength of Titan to 4.4% better than the un-tuned AP flow.

I found that there are a lot of issues with the initial placement which may be blocking a large amount of gains. Will be investigating the initial placement code soon.

Results on Titan compared to the default AP flow in Master right now (qp-hybrid solver is used in master by default):

Metric	Improvement
Post-GP HWPL	0.24
Post-FL HPWL	0.96
Post-Route WL	0.95
Num overfilled bins	0.51
Average bin overfill	0.90
Num cluster errors	0.92
Num atom errors	0.94
Average atom displacement	0.90
Max atom displacement	0.92
GP run time	1.33
FL run time	1.02
DP run time	1.05
Total run time	1.09

These changes improve wirelength by 5% on Titan, improve the mass-legalized solution by more than 2x (more than half the number of overfilled bins) and reduces atom/cluster errors as well as atom displacement. Run time took a hit, mainly in GP due to the bound2bound model being slower than the quadratic model; however, this only led to a 9% increase in overal run time.

AlexandreSinger · 2025-04-02T12:45:45Z

@amin1377 This is the tuning PR I have been hyping up for the last 2 weeks. The changes are actually lower than they appear, most of the line changes come from updating CI tests. Please review when you have a moment!

amin1377

Thanks, Alex! Overall, it looks great. I just added a few minor comments.

vpr/src/analytical_place/analytical_solver.cpp

vpr/src/analytical_place/analytical_solver.h

vpr/src/analytical_place/analytical_solver.cpp

vpr/src/place/initial_placement.cpp

The AP flow has many tunable knobs which trade-off quality and run time. Went through each of the knobs to find a good combination. Updates to the partial legalizer: - Reversed the order that unplaced large blocks are inserted into partitions. - Increased the bin cluster gap from 1 to 2 On the largest VTR benchmarks, this decreased the number of overfilled bins after legalization by 15% and the average overfill of each of those bins by 40%. On Titan, the number of overfilled bins decreased by 32% and the average overfill decreased by 2.5%. Updates to the analytical solver and global placer: - Allowed the B2B solver to stop early if it seems to be converging. - Changed the anchor weights from a linearized term to a quadratic term. - Decreased the distance epsilon from 0.5 to 0.01. - Increased the max number of B2B solver iterations from 6 to 24 - Decreased the CG iteration cap from 200 to 150. - The global placer saves the best legalized placement it has seen and returns it as its final result. On the largest VTR benchmarks, this decreased the post GP HPWL by 22% and decreased the GP run time by 17%. On Titan, the post GP HPWL decreased by 25%, and the GP run time decreased by 19%. Updates to APPack: - Decreased the max candidate distance from 0.5 (W + H) to 0.1 (W + H) for logical blocks. - Decreased the max candidate distance for all other blocks to 0.35 (W + H) - Lowered the attenuation distance threshold from 2.0 to 1.75. - Decreased the attenuation value at the distance threshold to 0.35. - Increased the max unrelated clustering distance from 1 to 5. - Increased the max number of unrelated clustering attempts from 2 to 10. - Turned off all APPack optimization for RAM blocks. On the largest VTR benchmarks, this decreased the wirelength by 2% over the un-tuned AP flow, with a 2.8% decreased pack time. On Titan, the post FL wirelength decreased by 6% and the post routing wirelength decreased by 2.6%, with a 0.7% decrease in pack time. Updates to initial placement: - Fixed oversight with how the centroid was being calculated. - Increased the range limit when searching for nearby locations when the location a cluster wants is take from 15 to 60. This further improved the post routing wirelength of Titan to 4.4% better than the un-tuned AP flow. I found that there are a lot of issues with the initial placement which may be blocking a large amount of gains. Will be investigating the initial placement code soon.

AlexandreSinger requested a review from amin1377 April 2, 2025 12:44

github-actions bot added VPR VPR FPGA Placement & Routing Tool docs Documentation lang-cpp C/C++ code labels Apr 2, 2025

AlexandreSinger force-pushed the feature-ap-tuning branch from df0723f to 47cbb16 Compare April 2, 2025 16:16

amin1377 requested changes Apr 3, 2025

View reviewed changes

AlexandreSinger force-pushed the feature-ap-tuning branch from 47cbb16 to ddd81e6 Compare April 3, 2025 15:18

amin1377 approved these changes Apr 3, 2025

View reviewed changes

amin1377 merged commit 6634f57 into verilog-to-routing:master Apr 3, 2025
36 checks passed

AlexandreSinger deleted the feature-ap-tuning branch April 3, 2025 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AP] Tuned the AP Flow #2961

[AP] Tuned the AP Flow #2961

Uh oh!

AlexandreSinger commented Apr 2, 2025 •

edited

Loading

Uh oh!

AlexandreSinger commented Apr 2, 2025

Uh oh!

amin1377 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[AP] Tuned the AP Flow #2961

[AP] Tuned the AP Flow #2961

Uh oh!

Conversation

AlexandreSinger commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexandreSinger commented Apr 2, 2025

Uh oh!

amin1377 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreSinger commented Apr 2, 2025 •

edited

Loading