Skip to content

High Fanout Net Thresholding in AP to Speed Up Solver #3137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 12, 2025
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions doc/src/vpr/command_line_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1324,6 +1324,14 @@ Analytical Placement is generally split into three stages:

**Default:** ``auto``

.. option:: --ap_high_fanout_threshold <int>

Defines the threshold for high fanout nets within AP flow.

Ignores the nets that have higher fanouts than the threshold for the analytical solver.

**Default:** ``256``

.. option:: --ap_verbosity <int>

Controls the verbosity of the AP flow output.
Expand Down
5 changes: 3 additions & 2 deletions vpr/src/analytical_place/analytical_placement_flow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ void run_analytical_placement_flow(t_vpr_setup& vpr_setup) {
const AtomNetlist& atom_nlist = g_vpr_ctx.atom().netlist();
const DeviceContext& device_ctx = g_vpr_ctx.device();
const UserPlaceConstraints& constraints = g_vpr_ctx.floorplanning().constraints;
const t_ap_opts& ap_opts = vpr_setup.APOpts;

// Run the prepacker
const Prepacker prepacker(atom_nlist, device_ctx.arch->models, device_ctx.logical_block_types);
Expand All @@ -178,7 +179,8 @@ void run_analytical_placement_flow(t_vpr_setup& vpr_setup) {
// prepacker.
APNetlist ap_netlist = gen_ap_netlist_from_atoms(atom_nlist,
prepacker,
constraints);
constraints,
ap_opts.ap_high_fanout_threshold);
print_ap_netlist_stats(ap_netlist);

// Pre-compute the pre-clustering timing delays. This object will be passed
Expand Down Expand Up @@ -208,7 +210,6 @@ void run_analytical_placement_flow(t_vpr_setup& vpr_setup) {
}

// Run the Global Placer.
const t_ap_opts& ap_opts = vpr_setup.APOpts;
PartialPlacement p_placement = run_global_placer(ap_opts,
atom_nlist,
ap_netlist,
Expand Down
11 changes: 10 additions & 1 deletion vpr/src/analytical_place/gen_ap_netlist_from_atoms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@

APNetlist gen_ap_netlist_from_atoms(const AtomNetlist& atom_netlist,
const Prepacker& prepacker,
const UserPlaceConstraints& constraints) {
const UserPlaceConstraints& constraints,
const int& high_fanout_threshold) {
// Create a scoped timer for reading the atom netlist.
vtr::ScopedStartFinishTimer timer("Read Atom Netlist to AP Netlist");

Expand Down Expand Up @@ -115,6 +116,7 @@ APNetlist gen_ap_netlist_from_atoms(const AtomNetlist& atom_netlist,
// - a global net
// - connected to 1 or fewer unique blocks
// - connected to only fixed blocks
// - having fanout higher than threshold
for (APNetId ap_net_id : ap_netlist.nets()) {
// Is the net ignored for placement, if so mark as ignored for AP.
const std::string& net_name = ap_netlist.net_name(ap_net_id);
Expand Down Expand Up @@ -153,6 +155,13 @@ APNetlist gen_ap_netlist_from_atoms(const AtomNetlist& atom_netlist,
ap_netlist.set_net_is_ignored(ap_net_id, true);
continue;
}
// If fanout number of the net is higher than the threshold, mark as ignored for AP.
size_t num_pins = ap_netlist.net_pins(ap_net_id).size();
VTR_ASSERT_DEBUG(num_pins > 1);
if (num_pins - 1 > static_cast<size_t>(high_fanout_threshold)) {
ap_netlist.set_net_is_ignored(ap_net_id, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up about setting the net as ignored. This does ignore the net during global placement; however, it also ignores the net when calculating the HPWL during Global Placement!

double PartialPlacement::get_hpwl(const APNetlist& netlist) const {
double hpwl = 0.0;
for (APNetId net_id : netlist.nets()) {
if (netlist.net_is_ignored(net_id))
continue;
double min_x = std::numeric_limits<double>::max();
double max_x = std::numeric_limits<double>::lowest();
double min_y = std::numeric_limits<double>::max();
double max_y = std::numeric_limits<double>::lowest();
for (APPinId pin_id : netlist.net_pins(net_id)) {
APBlockId blk_id = netlist.pin_block(pin_id);
min_x = std::min(min_x, block_x_locs[blk_id]);
max_x = std::max(max_x, block_x_locs[blk_id]);
min_y = std::min(min_y, block_y_locs[blk_id]);
max_y = std::max(max_y, block_y_locs[blk_id]);
}
VTR_ASSERT_SAFE(max_x >= min_x && max_y >= min_y);
hpwl += max_x - min_x + max_y - min_y;
}
return hpwl;
}

This only affects calculating the HPWL during GP, the other HPWL calculations (such as post-FL and post-DP) will not be affected. I originally did this for debugging (since we do not really care if nets that we are ignoring are getting longer); however this may be a bit confusing now that this is becoming more mature. I honestly have no idea how to resolve this issue in practice. Should we even be ignoring nets when computing HPWL?

@vaughnbetz I guess this is more of a question for you. Do you see any issue with ignoring nets during the HPWL estimation. We do use this for debugging as well as part of the algorithm to estimate the quality of the placement. My gut feeling is to ignore the nets when computing HPWL since its just an estimate anyways. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a separate flag (or just inline code) to control what goes in the solver. The rest of the flow only ignores nets that are assumed to be perfectly routed on a global network. Ignoring some algorithmically selected nets in howl calculations is going to be confusing as it doesn't match the rest of the flow.

Copy link
Contributor

@AlexandreSinger AlexandreSinger Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a great idea Vaughn! I like that idea. In the solver we can ignore the nets (since it would speed up computing HPWL each iteration and it would make the value more accurate to what we are optimizing), and then when we report the final HPWL we can only ignore nets marked as global!

@haydar-c Lets not gate your change! This is something that I can add after your PR is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sounds great!

continue;
}
}
ap_netlist.compress();

Expand Down
13 changes: 8 additions & 5 deletions vpr/src/analytical_place/gen_ap_netlist_from_atoms.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,16 @@ class UserPlaceConstraints;
/**
* @brief Use the results from prepacking the atom netlist to generate an APNetlist.
*
* @param atom_netlist The atom netlist for the input design.
* @param prepacker The prepacker, initialized on the provided atom netlist.
* @param constraints The placement constraints on the Atom blocks, provided
* by the user.
* @param atom_netlist The atom netlist for the input design.
* @param prepacker The prepacker, initialized on the provided atom netlist.
* @param constraints The placement constraints on the Atom blocks, provided
* by the user.
* @param high_fanout_threshold The threshold above which nets with higher fanout will
* be ignored.
*
* @return An APNetlist object, generated from the prepacker results.
*/
APNetlist gen_ap_netlist_from_atoms(const AtomNetlist& atom_netlist,
const Prepacker& prepacker,
const UserPlaceConstraints& constraints);
const UserPlaceConstraints& constraints,
const int& high_fanout_threshold);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: int do not need to be passed by reference. In fact, it may actually be slower to pass an int by reference. Turn into pass by value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple types, like integers, you can just pass them directly:

int high_fanout_threshold);

We pass the other arguments to this function by const reference to prevent deep copies which are very very expensive for these types.

Copy link

@vaughnb-cerebras vaughnb-cerebras Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haydar-c General rule: if it is smaller than or equal to the size of a pointer (64-bits) and you don't need to update it in the callee, pass by value. It avoids a pointer access to get it and hence will be faster (unless the compiler is clever enough to optimize out the pointer access, but I wouldn't count on that).

If it is bigger than 64-bit then pass by const ref if you don't need to modify it. Faster than a large copy.

If you need to modify it, no option but to pass by (non-const) reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vaughn!

6 changes: 6 additions & 0 deletions vpr/src/base/CheckSetup.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@ void CheckSetup(const t_packer_opts& packer_opts,
"ap_timing_tradeoff expects a value between 0.0 and 1.0");
}

// Make sure that the high fanout threshold for solver is valid.
if (ap_opts.ap_high_fanout_threshold <= 1) {
VPR_FATAL_ERROR(VPR_ERROR_OTHER,
"ap_high_fanout_threshold should be greater than 1");
}

// TODO: Should we enforce that the size of the device is fixed. This
// goes with ensuring that some blocks are fixed.
}
Expand Down
1 change: 1 addition & 0 deletions vpr/src/base/SetupVPR.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,7 @@ void SetupAPOpts(const t_options& options,
apOpts.full_legalizer_type = options.ap_full_legalizer.value();
apOpts.detailed_placer_type = options.ap_detailed_placer.value();
apOpts.ap_timing_tradeoff = options.ap_timing_tradeoff.value();
apOpts.ap_high_fanout_threshold = options.ap_high_fanout_threshold.value();
apOpts.appack_max_dist_th = options.appack_max_dist_th.value();
apOpts.num_threads = options.num_workers.value();
apOpts.log_verbosity = options.ap_verbosity.value();
Expand Down
1 change: 1 addition & 0 deletions vpr/src/base/ShowSetup.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -656,6 +656,7 @@ static void ShowAnalyticalPlacerOpts(const t_ap_opts& APOpts) {
}

VTR_LOG("AnalyticalPlacerOpts.ap_timing_tradeoff: %f\n", APOpts.ap_timing_tradeoff);
VTR_LOG("AnalyticalPlacerOpts.ap_high_fanout_threshold: %d\n", APOpts.ap_high_fanout_threshold);
VTR_LOG("AnalyticalPlacerOpts.log_verbosity: %d\n", APOpts.log_verbosity);
}

Expand Down
7 changes: 7 additions & 0 deletions vpr/src/base/read_options.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1950,6 +1950,13 @@ argparse::ArgumentParser create_arg_parser(const std::string& prog_name, t_optio
.default_value("0.5")
.show_in(argparse::ShowIn::HELP_ONLY);

ap_grp.add_argument<int>(args.ap_high_fanout_threshold, "--ap_high_fanout_threshold")
.help(
"Defines the threshold for high fanout nets within AP flow.\n"
"Ignores the nets that have higher fanouts than the threshold for the analytical solver.")
.default_value("256")
.show_in(argparse::ShowIn::HELP_ONLY);

ap_grp.add_argument(args.appack_max_dist_th, "--appack_max_dist_th")
.help(
"Sets the maximum candidate distance thresholds for the logical block types"
Expand Down
1 change: 1 addition & 0 deletions vpr/src/base/read_options.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ struct t_options {
argparse::ArgValue<std::vector<std::string>> appack_max_dist_th;
argparse::ArgValue<int> ap_verbosity;
argparse::ArgValue<float> ap_timing_tradeoff;
argparse::ArgValue<int> ap_high_fanout_threshold;
argparse::ArgValue<bool> ap_generate_mass_report;

/* Clustering options */
Expand Down
5 changes: 5 additions & 0 deletions vpr/src/base/vpr_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -1102,6 +1102,9 @@ struct t_placer_opts {
* @param ap_timing_tradeoff
* A trade-off parameter used to decide how focused the AP flow
* should be on optimizing timing over wirelength.
* @param ap_high_fanout_threshold;
* The threshold to ignore nets with higher fanout than that
* value while constructing the solver.
* @param appack_max_dist_th
* Array of string passed by the user to configure the max candidate
* distance thresholds.
Expand All @@ -1126,6 +1129,8 @@ struct t_ap_opts {

float ap_timing_tradeoff;

int ap_high_fanout_threshold;

std::vector<std::string> appack_max_dist_th;

unsigned num_threads;
Expand Down
Loading