Skip to content

[AP] Tuned the AP Flow #2961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

AlexandreSinger
Copy link
Contributor

@AlexandreSinger AlexandreSinger commented Apr 2, 2025

The AP flow has many tunable knobs which trade-off quality and run time. Went through each of the knobs to find a good combination.

Updates to the partial legalizer:

  • Reversed the order that unplaced large blocks are inserted into partitions.
  • Increased the bin cluster gap from 1 to 2

On the largest VTR benchmarks, this decreased the number of overfilled bins after legalization by 15% and the average overfill of each of those bins by 40%.
On Titan, the number of overfilled bins decreased by 32% and the average overfill decreased by 2.5%.

Updates to the analytical solver and global placer:

  • Allowed the B2B solver to stop early if it seems to be converging.
  • Changed the anchor weights from a linearized term to a quadratic term.
  • Decreased the distance epsilon from 0.5 to 0.01.
  • Increased the max number of B2B solver iterations from 6 to 24
  • Decreased the CG iteration cap from 200 to 150.
  • The global placer saves the best legalized placement it has seen and returns it as its final result.

On the largest VTR benchmarks, this decreased the post GP HPWL by 22% and decreased the GP run time by 17%.
On Titan, the post GP HPWL decreased by 25%, and the GP run time decreased by 19%.

Updates to APPack:

  • Decreased the max candidate distance from 0.5 (W + H) to 0.1 (W + H) for logical blocks.
  • Decreased the max candidate distance for all other blocks to 0.35 (W + H)
  • Lowered the attenuation distance threshold from 2.0 to 1.75.
  • Decreased the attenuation value at the distance threshold to 0.35.
  • Increased the max unrelated clustering distance from 1 to 5.
  • Increased the max number of unrelated clustering attempts from 2 to 10.
  • Turned off all APPack optimization for RAM blocks.

On the largest VTR benchmarks, this decreased the wirelength by 2% over the un-tuned AP flow, with a 2.8% decreased pack time. On Titan, the post FL wirelength decreased by 6% and the post routing wirelength decreased by 2.6%, with a 0.7% decrease in pack time.

Updates to initial placement:

  • Fixed oversight with how the centroid was being calculated.
  • Increased the range limit when searching for nearby locations when the location a cluster wants is take from 15 to 60.

This further improved the post routing wirelength of Titan to 4.4% better than the un-tuned AP flow.

I found that there are a lot of issues with the initial placement which may be blocking a large amount of gains. Will be investigating the initial placement code soon.

Results on Titan compared to the default AP flow in Master right now (qp-hybrid solver is used in master by default):

Metric Improvement
Post-GP HWPL 0.24
Post-FL HPWL 0.96
Post-Route WL 0.95
Num overfilled bins 0.51
Average bin overfill 0.90
Num cluster errors 0.92
Num atom errors 0.94
Average atom displacement 0.90
Max atom displacement 0.92
GP run time 1.33
FL run time 1.02
DP run time 1.05
Total run time 1.09

These changes improve wirelength by 5% on Titan, improve the mass-legalized solution by more than 2x (more than half the number of overfilled bins) and reduces atom/cluster errors as well as atom displacement. Run time took a hit, mainly in GP due to the bound2bound model being slower than the quadratic model; however, this only led to a 9% increase in overal run time.

@AlexandreSinger AlexandreSinger requested a review from amin1377 April 2, 2025 12:44
@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool docs Documentation lang-cpp C/C++ code labels Apr 2, 2025
@AlexandreSinger
Copy link
Contributor Author

@amin1377 This is the tuning PR I have been hyping up for the last 2 weeks. The changes are actually lower than they appear, most of the line changes come from updating CI tests. Please review when you have a moment!

Copy link
Contributor

@amin1377 amin1377 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Alex! Overall, it looks great. I just added a few minor comments.

The AP flow has many tunable knobs which trade-off quality and run time.
Went through each of the knobs to find a good combination.

Updates to the partial legalizer:
- Reversed the order that unplaced large blocks are inserted into partitions.
- Increased the bin cluster gap from 1 to 2
On the largest VTR benchmarks, this decreased the number of overfilled
bins after legalization by 15% and the average overfill of each of those
bins by 40%.
On Titan, the number of overfilled bins decreased by 32% and the average
overfill decreased by 2.5%.

Updates to the analytical solver and global placer:
- Allowed the B2B solver to stop early if it seems to be converging.
- Changed the anchor weights from a linearized term to a quadratic term.
- Decreased the distance epsilon from 0.5 to 0.01.
- Increased the max number of B2B solver iterations from 6 to 24
- Decreased the CG iteration cap from 200 to 150.
- The global placer saves the best legalized placement it has seen and
  returns it as its final result.
On the largest VTR benchmarks, this decreased the post GP HPWL by 22%
and decreased the GP run time by 17%.
On Titan, the post GP HPWL decreased by 25%, and the GP run time
decreased by 19%.

Updates to APPack:
- Decreased the max candidate distance from 0.5 (W + H) to 0.1 (W + H)
  for logical blocks.
- Decreased the max candidate distance for all other blocks to 0.35 (W +
  H)
- Lowered the attenuation distance threshold from 2.0 to 1.75.
- Decreased the attenuation value at the distance threshold to 0.35.
- Increased the max unrelated clustering distance from 1 to 5.
- Increased the max number of unrelated clustering attempts from 2 to
  10.
- Turned off all APPack optimization for RAM blocks.
On the largest VTR benchmarks, this decreased the wirelength by 2% over
the un-tuned AP flow, with a 2.8% decreased pack time.
On Titan, the post FL wirelength decreased by 6% and the post routing
wirelength decreased by 2.6%, with a 0.7% decrease in pack time.

Updates to initial placement:
- Fixed oversight with how the centroid was being calculated.
- Increased the range limit when searching for nearby locations when the
  location a cluster wants is take from 15 to 60.
This further improved the post routing wirelength of Titan to 4.4%
better than the un-tuned AP flow.

I found that there are a lot of issues with the initial placement which
may be blocking a large amount of gains. Will be investigating the
initial placement code soon.
@amin1377 amin1377 merged commit 6634f57 into verilog-to-routing:master Apr 3, 2025
36 checks passed
@AlexandreSinger AlexandreSinger deleted the feature-ap-tuning branch April 3, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants