Skip to content

[AP][GlobalPlacment] Added Bound2Bound Solver #2949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

AlexandreSinger
Copy link
Contributor

The Bound2Bound net model is a method to solve for the linear HPWL objective by iteratively solving a quadratic objective function.

This method does obtain a better quality post-global placement flat placement; at the expense of being more computationally expensive.

Found that this solver also has numerical stability issues. This may cause the CG solver to never converge which will hit the iteration limit of 2 * the number of moveable blocks. This makes this algorithm quadratic with the number of blocks in the netlist. To resolve this, set a custom iteration limit. This seems to work well on our benchmarks but may need to be revisited in the future.

I made B2B the default for the AP flow since I found that, although it takes more time, it achieves a better quality of results over using the hybrid net model.

I slightly tuned APPack to account for the improved global placement.

Quick results on Titan:

  • WL improved by 1% compared to the old AP flow
  • Global Placement runtime doubled compared to the old AP flow.
  • Post GP HPWL was 3.5x lower
  • Runtime increased by 27% compared to the old AP flow.

Although the 1% WL improvement does not seem like much, the 3.5x improved post GP hpwl implies that with some more tuning to APPack we can achieve even better QoR!

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool docs Documentation lang-cpp C/C++ code labels Mar 22, 2025
// Get the anchor weight based on the iteration number. We want the anchor
// weights to get stronger as we get later in global placement. Found that
// an exponential weight term worked well for this.
double coeff_pseudo_anchor = anchor_weight_mult_ * std::exp((double)iteration / anchor_weight_exp_fac_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something commonly done across analytical placers? Also, do you print this number in the output when showing information for each iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of two main techniques used for quadratic global placers (the other is to apply forces to moveable blocks directly). I do not print this information since it is specific to only this type of analytical solver. It would be a good debug thing to print in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; it would definitely be useful for debugging. I guess my question wasn’t so much about adding the link to the anchor, but more out of curiosity: is increasing the anchor weight a common practice in other analytical placers as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe increasing the weight over the GP iterations is necessary. Without increasing the weight, GP may never converge on a solution. This is due to the forces bringing the blocks together would be higher than the forces pushing them apart.

Another benefit of increasing the anchor-weights is that it guarantees convergence. Once the anchor-weights hit infinity, the solved solution will be equal to the mass-legalized solution and therefore the gap must be zero. This is in theory, but I have yet to find a circuit that did not converge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Thanks!

}
}

void B2BSolver::init_linear_system(PartialPlacement& p_placement) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned before that building the matrices is pretty fast. Still, I think it would be useful to measure the time it takes to construct the matrices and include it as one of the columns in the AP output table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be too much information to be shown each iteration of global placement. But I can include the total time spent constructing matrices in the print_statistics method! I think it would be useful to know!

for (size_t row_id_idx = 0; row_id_idx < num_moveable_blocks_; row_id_idx++) {
// Since we are capping the number of iterations, it is likely that
// the solver will overstep and give a negative number here. Just
// clamp them to zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is not accurate. The negative values are not caused by capping the iterations or "overstepping." In CG, negative values may appear because the solver doesn't enforce constraints, and clamping the output to zero isn't typically part of the standard CG flow. Also, I have a question—should we be checking to ensure the values don’t exceed the width/height limits as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is caused by capping the number of iterations. CG takes steps to move from a guess solution towards a better solution.
The solution MUST be somewhere within the bounds of the device since the fixed points are within the bounds of the device.
The guess is always within the chip by construction (its between 0 and the W/H of the device). If CG returns a negative number, it means that the solution was very close to zero (imagine all fixed blocks happen to be at 0) and it took a step towards 0 and overstepped. Given enough iterations it will step sufficiently close to 0 (either from the negative or positive direction); the capping causes it to stop early on the negative side.

You are correct that CG does not enforce constraints, but there are implicit constraints on the solution imposed by the locations of the fixed blocks (the solution will be within the bounding box of all fixed blocks by problem construction). The CG solver may return a point outside of this bounding box if it did not fully converge.

Regarding checking the width / height, technically these should be checked; however it should not be as big of a problem. The valid locations range from [0, W) for the x dim and [0, H) for the y dimension (technically its from [0, W - 1] and [0, H - 1], but we round down any number between (W - 1, W) for example). If CG does not converge and oversteps a solution which happens to be at W - 1, it would be unlikely if it overstep all the way to W + epsilon to cause a problem.

Long story short, the TODO still holds. This needs to be handled better. It may be possible to mathematically bound how far the step into the negative dimension may be given the iteration we stopped it. I am worried that this solution may allow bugs to slip through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overstepping may not be the best word to use here since we do not do the solving step ourselves in our code base. I will clean up the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2025-03-31 15-50-27

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the explanation!

"Controls which Analytical Solver the Global Placer will use in the AP Flow.\n"
" * qp-hybrid: olves for a placement that minimizes the quadratic HPWL of the flat placement using a hybrid clique/star net model.\n"
" * lp-b2b: Solves for a placement that minimizes the linear HPWL of theflat placement using the Bound2Bound net model.")
.default_value("lp-b2b")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the current tradeoff between runtime and QoR (similar QoR but significantly worse runtime), having B2B as the default option feels a bit odd to me. I’d suggest keeping qp-hybrid as the default until the tuning for lp-b2b is finalized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will reactivate it with my tunings.

The Bound2Bound net model is a method to solve for the linear HPWL
objective by iteratively solving a quadratic objective function.

This method does obtain a better quality post-global placement flat
placement; at the expense of being more computationally expensive.

Found that this solver also has numerical stability issues. This may
cause the CG solver to never converge which will hit the iteration limit
of 2 * the number of moveable blocks. This makes this algorithm
quadratic with the number of blocks in the netlist. To resolve this, set
a custom iteration limit. This seems to work well on our benchmarks but
may need to be revisited in the future.
@AlexandreSinger
Copy link
Contributor Author

@amin1377 Thank you so much for the comments. I have resolved them and I am running CI now. Do you have any further comments? After this is merged I plan to merge in my tunings.

@amin1377
Copy link
Contributor

amin1377 commented Apr 2, 2025

@AlexandreSinger: Looks good to me. I don’t have any further comments. Feel free to merge it if you’ve made all the changes you wanted.
By the way, I’m not sure if you can or want to publish your paper on arXiv right now , but once it’s up, don’t forget to add it to the AP comments!

@AlexandreSinger AlexandreSinger merged commit 64ab163 into verilog-to-routing:master Apr 2, 2025
36 checks passed
@AlexandreSinger AlexandreSinger deleted the feature-ap-solver branch April 2, 2025 12:28
@AlexandreSinger
Copy link
Contributor Author

Thanks @amin1377 , I agree that the comments will need to be updated when that paper comes out! But I am not sure about arXiv; we'll see!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants