-
Notifications
You must be signed in to change notification settings - Fork 414
[AP] Tuned the AP Flow #2961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AP] Tuned the AP Flow #2961
Conversation
@amin1377 This is the tuning PR I have been hyping up for the last 2 weeks. The changes are actually lower than they appear, most of the line changes come from updating CI tests. Please review when you have a moment! |
df0723f
to
47cbb16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Alex! Overall, it looks great. I just added a few minor comments.
The AP flow has many tunable knobs which trade-off quality and run time. Went through each of the knobs to find a good combination. Updates to the partial legalizer: - Reversed the order that unplaced large blocks are inserted into partitions. - Increased the bin cluster gap from 1 to 2 On the largest VTR benchmarks, this decreased the number of overfilled bins after legalization by 15% and the average overfill of each of those bins by 40%. On Titan, the number of overfilled bins decreased by 32% and the average overfill decreased by 2.5%. Updates to the analytical solver and global placer: - Allowed the B2B solver to stop early if it seems to be converging. - Changed the anchor weights from a linearized term to a quadratic term. - Decreased the distance epsilon from 0.5 to 0.01. - Increased the max number of B2B solver iterations from 6 to 24 - Decreased the CG iteration cap from 200 to 150. - The global placer saves the best legalized placement it has seen and returns it as its final result. On the largest VTR benchmarks, this decreased the post GP HPWL by 22% and decreased the GP run time by 17%. On Titan, the post GP HPWL decreased by 25%, and the GP run time decreased by 19%. Updates to APPack: - Decreased the max candidate distance from 0.5 (W + H) to 0.1 (W + H) for logical blocks. - Decreased the max candidate distance for all other blocks to 0.35 (W + H) - Lowered the attenuation distance threshold from 2.0 to 1.75. - Decreased the attenuation value at the distance threshold to 0.35. - Increased the max unrelated clustering distance from 1 to 5. - Increased the max number of unrelated clustering attempts from 2 to 10. - Turned off all APPack optimization for RAM blocks. On the largest VTR benchmarks, this decreased the wirelength by 2% over the un-tuned AP flow, with a 2.8% decreased pack time. On Titan, the post FL wirelength decreased by 6% and the post routing wirelength decreased by 2.6%, with a 0.7% decrease in pack time. Updates to initial placement: - Fixed oversight with how the centroid was being calculated. - Increased the range limit when searching for nearby locations when the location a cluster wants is take from 15 to 60. This further improved the post routing wirelength of Titan to 4.4% better than the un-tuned AP flow. I found that there are a lot of issues with the initial placement which may be blocking a large amount of gains. Will be investigating the initial placement code soon.
47cbb16
to
ddd81e6
Compare
The AP flow has many tunable knobs which trade-off quality and run time. Went through each of the knobs to find a good combination.
Updates to the partial legalizer:
On the largest VTR benchmarks, this decreased the number of overfilled bins after legalization by 15% and the average overfill of each of those bins by 40%.
On Titan, the number of overfilled bins decreased by 32% and the average overfill decreased by 2.5%.
Updates to the analytical solver and global placer:
On the largest VTR benchmarks, this decreased the post GP HPWL by 22% and decreased the GP run time by 17%.
On Titan, the post GP HPWL decreased by 25%, and the GP run time decreased by 19%.
Updates to APPack:
On the largest VTR benchmarks, this decreased the wirelength by 2% over the un-tuned AP flow, with a 2.8% decreased pack time. On Titan, the post FL wirelength decreased by 6% and the post routing wirelength decreased by 2.6%, with a 0.7% decrease in pack time.
Updates to initial placement:
This further improved the post routing wirelength of Titan to 4.4% better than the un-tuned AP flow.
I found that there are a lot of issues with the initial placement which may be blocking a large amount of gains. Will be investigating the initial placement code soon.
Results on Titan compared to the default AP flow in Master right now (qp-hybrid solver is used in master by default):
These changes improve wirelength by 5% on Titan, improve the mass-legalized solution by more than 2x (more than half the number of overfilled bins) and reduces atom/cluster errors as well as atom displacement. Run time took a hit, mainly in GP due to the bound2bound model being slower than the quadratic model; however, this only led to a 9% increase in overal run time.