Skip to content

[AP] Analytical Solver #2756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexandreSinger
Copy link
Contributor

This introduces the Analytical Solver class to the AP flow. This is an integral part of the Global Placement stage and what gives Analytical Placement its name.

This PR introduces the QP_HYBRID analytical solver which uses a hybrid Clique and Star net model to optimize the quadratic HPWL objective.

The code is designed to allow for other analytical solvers to be implemented and interchanged without issue. A B2B solver will be coming in a future PR.

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Oct 2, 2024
@AlexandreSinger
Copy link
Contributor Author

AlexandreSinger commented Oct 2, 2024

@vaughnbetz The code is functional and ready for review. The tests are failing because Eigen is not currently included in the CI for the regular tests. Do you think it is appropriate to always include Eigen in the CI tests? and to that end, should we make Eigen a required package for VTR? Let me know what you think. One idea is I can make the analytical solver class crash if Eigen is not installed (but everything else can theoretically work) using IFDEFs. It would just be a bit messy.

@github-actions github-actions bot added infra Project Infrastructure lang-make CMake/Make code labels Oct 4, 2024
@AlexandreSinger AlexandreSinger force-pushed the feature-ap-solver-upstreaming branch 3 times, most recently from a455786 to d305478 Compare October 4, 2024 15:09
Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few thoughts and feedbacks.

// Count the number of star nodes that the netlist will have.
size_t num_star_nodes = 0;
for (APNetId net_id : netlist_.nets()) {
if (netlist_.net_pins(net_id).size() > star_num_pins_threshold)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an earlier PR you had some conditions under which a net would be ignored; you may want to make and call a function AP::is_ignored_net or some such and use it here to avoid creating variables for star nodes on such nets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think you removed all uninteresting nets from the AP netlist already, so ignore that comment ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. I realized that every single one of these for loops that iterate over the nets in the APNetlist were always ignoring the nets, which I felt was an accident waiting to happen. I completely removed them from the netlist. If we ever need all the nets, we can always go back to the Atom netlist, but I really do not expect to need those nets.

// to the target with the given weight. The src_row_id may represent a star
// node (so it does not represent an APBlock) or a moveable APBlock. The
// target_blk_id may be a fixed or moveable block.
auto add_connection_to_system = [&](size_t src_row_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this, but isn't the A matrix generally square and hence would have columns for star nodes too? (Maybe I'm wrong on this though ....)

In any case, I'd comment the high-level logic of why matrix elements are set this way (can include a ref to a paper)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is big enough that it seems more logical to make it a normal function than a lambda (I am not a big fan of large lambdas since their code is embedded in another function and you have to read their closure notation carefully to see that you just used this as an unnamed inline function). Using a named function also lets you make a proper doxygen comment for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. The A matrix is square and symmetric, so the row and column are literally synonymous (it does not matter which one we use). I chose row because col just does not have the same ring to it. I added a comment to make this clear.

I agree about not making it a lambda, I just didnt like how many arguments it needed and I needed it to be very fast. I opted to just bite the bullet on the number of arguments.

// Create a clique connection where every block in a net connects
// exactly once to every other block in the net.
// Using the weight from FastPlace
double w = 1.0 / static_cast<double>(num_pins - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might give too much weight to high fanout nets (n^2 edges * 1/n weight --> n weight for an n-pin net). ASIC placers may run with more controlled fanout (generally I'd expect some buffer insertion was done before running an ASIC placer) so the scaling of this to the very high fanouts in FPGA nets may not work out well. I'd add a note / TODO for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the star node weight. For a net with P pins, there will be P edges created or the star net model. This weight factor is a correction term to account for these P extra pins. Hence why it is a linear correct. This is from the FastPlace paper. We can explore other weight terms, but I am not sure if they approach the correct solution. I will add a TODO to investigate.

* Where coeff_pseudo_anchor grows with each iteration.
*
* This is basically a fast way of adding a connection between a moveable block
* and a fixed block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment!

Would like to see something like this for the net coefficient code ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this function be shorter (need fewer inputs) if it was part of the analytical solver class?
Should comment the inputs (names are very good, so short comments probably OK but I still think it's good to have a full doxygen comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no. It would only save 2 arguments. It also does not make sense to use this method outside of this file's context.


// Write the results back into the partial placement object.
for (size_t row_id_idx = 0; row_id_idx < num_moveable_blocks_; row_id_idx++) {
APRowId row_id = APRowId(row_id_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code assumes that the first rows are all for movable blocks. I think you commented that elsewhere so it's not the end of the world, but you could rewrite this code to go through all blocks and get their mapping to rows, so it wouldn't have any such assumption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually done for performance. I think iterating linearly from 0 to num_moveable_blocks would be more efficient than iterating over all blocks and skipping the fixed blocks. I left a comment explaining that the first rows must represent the moveable blocks and added debug asserts to enforce this.

#include "ap_netlist.h"

double PartialPlacement::get_hpwl(const APNetlist& netlist) const {
double hpwl = 0.0;
for (APNetId net_id : netlist.nets()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the ignored nets removed from the ap netlist so there's no need to check for them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. This HPWL is not meant to be accurate, its meant to track how the Global Placer is converging. If we used a more accurate HPWL which knew more information about the problem than the solver, we may never converge I think.

This introduces the Analytical Solver class to the AP flow. This is an
integral part of the Global Placement stage and what gives Analytical
Placement its name.

This PR introduces the QP_HYBRID analytical solver which uses a hybrid
Clique and Star net model to optimize the quadratic HPWL objective.

The code is designed to allow for other analytical solvers to be
implemented and interchanged without issue. A B2B solver will be coming
in a future PR.
@AlexandreSinger AlexandreSinger force-pushed the feature-ap-solver-upstreaming branch from d305478 to eabb6e3 Compare October 5, 2024 18:36
@AlexandreSinger AlexandreSinger merged commit 3b74db5 into verilog-to-routing:master Oct 7, 2024
37 of 53 checks passed
@AlexandreSinger AlexandreSinger deleted the feature-ap-solver-upstreaming branch October 7, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra Project Infrastructure lang-cpp C/C++ code lang-make CMake/Make code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants