Skip to content

High Fanout Net Thresholding in AP to Speed Up Solver #3137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 12, 2025

Conversation

haydar-c
Copy link
Contributor

@haydar-c haydar-c commented Jun 12, 2025

Description

This PR filters out the nets with higher fanout than a specified threshold from the APNetlist. The aim is to make the solver faster without hurting the quality or results. This PR also adds a command line option to set this threshold. The default value is set to 256 by trying on Titan, Koios, and VTR Chain benchmarks. This is a safe value. The command-line option also allows tuning if needed, depending on the solver configuration.

Related Issue

This PR solves the issue mentioned in #2974.

Motivation and Context

The motivation is to make the solver faster by ignoring the high fanout nets without degrading the quality of results.

How Has This Been Tested?

After selecting the default value of 256, all three benchmarks of Titan, Koios, and VTR Chain is run with the default value and with master (without the changes introduced in this PR).

In each run, the following build command was used:

make CMAKE_PARAMS="-DVTR_IPO_BUILD=on -DVTR_ASSERT_LEVEL=1"

All metrics below are normalized to the corresponding master run.

Benchmark Suite Post GP HPWL Post FL HPWL Routed Wirelength GP Solver Runtime GP Runtime AP Runtime Total Runtime Num LABs/CLBs Total CG Iters
Titan 0.9837 1.0108 1.0006 0.7706 0.7957 0.9249 0.9384 0.9993 0.9881
Koios 0.9476 0.9868 0.9967 0.6993 0.7389 0.9334 0.9520 0.9990 0.9956
VTR Chain 0.9958 0.9749 0.9903 0.7650 0.7868 0.9300 0.9396 1.0018 0.9775

All in all, applying the high fanout net thresholding reduces GP solver runtime by 23–30%, translating to an overall total runtime improvement of 5–6% across Titan, Koios, and VTR Chain benchmarks. Quality metrics such as routed wirelength and LAB count are preserved, with observed differences assumed to be noise.

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool docs Documentation lang-cpp C/C++ code labels Jun 12, 2025
@haydar-c haydar-c marked this pull request as ready for review June 12, 2025 14:21
size_t num_pins = ap_netlist.net_pins(ap_net_id).size();
VTR_ASSERT_DEBUG(num_pins > 1);
if (num_pins - 1 > static_cast<size_t>(high_fanout_threshold)) {
ap_netlist.set_net_is_ignored(ap_net_id, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up about setting the net as ignored. This does ignore the net during global placement; however, it also ignores the net when calculating the HPWL during Global Placement!

double PartialPlacement::get_hpwl(const APNetlist& netlist) const {
double hpwl = 0.0;
for (APNetId net_id : netlist.nets()) {
if (netlist.net_is_ignored(net_id))
continue;
double min_x = std::numeric_limits<double>::max();
double max_x = std::numeric_limits<double>::lowest();
double min_y = std::numeric_limits<double>::max();
double max_y = std::numeric_limits<double>::lowest();
for (APPinId pin_id : netlist.net_pins(net_id)) {
APBlockId blk_id = netlist.pin_block(pin_id);
min_x = std::min(min_x, block_x_locs[blk_id]);
max_x = std::max(max_x, block_x_locs[blk_id]);
min_y = std::min(min_y, block_y_locs[blk_id]);
max_y = std::max(max_y, block_y_locs[blk_id]);
}
VTR_ASSERT_SAFE(max_x >= min_x && max_y >= min_y);
hpwl += max_x - min_x + max_y - min_y;
}
return hpwl;
}

This only affects calculating the HPWL during GP, the other HPWL calculations (such as post-FL and post-DP) will not be affected. I originally did this for debugging (since we do not really care if nets that we are ignoring are getting longer); however this may be a bit confusing now that this is becoming more mature. I honestly have no idea how to resolve this issue in practice. Should we even be ignoring nets when computing HPWL?

@vaughnbetz I guess this is more of a question for you. Do you see any issue with ignoring nets during the HPWL estimation. We do use this for debugging as well as part of the algorithm to estimate the quality of the placement. My gut feeling is to ignore the nets when computing HPWL since its just an estimate anyways. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a separate flag (or just inline code) to control what goes in the solver. The rest of the flow only ignores nets that are assumed to be perfectly routed on a global network. Ignoring some algorithmically selected nets in howl calculations is going to be confusing as it doesn't match the rest of the flow.

Copy link
Contributor

@AlexandreSinger AlexandreSinger Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a great idea Vaughn! I like that idea. In the solver we can ignore the nets (since it would speed up computing HPWL each iteration and it would make the value more accurate to what we are optimizing), and then when we report the final HPWL we can only ignore nets marked as global!

@haydar-c Lets not gate your change! This is something that I can add after your PR is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sounds great!

Copy link
Contributor

@AlexandreSinger AlexandreSinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic to me! Very well done @haydar-c

One minor comment and I think this is good to go!

*
* @return An APNetlist object, generated from the prepacker results.
*/
APNetlist gen_ap_netlist_from_atoms(const AtomNetlist& atom_netlist,
const Prepacker& prepacker,
const UserPlaceConstraints& constraints);
const UserPlaceConstraints& constraints,
const int& high_fanout_threshold);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: int do not need to be passed by reference. In fact, it may actually be slower to pass an int by reference. Turn into pass by value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple types, like integers, you can just pass them directly:

int high_fanout_threshold);

We pass the other arguments to this function by const reference to prevent deep copies which are very very expensive for these types.

Copy link

@vaughnb-cerebras vaughnb-cerebras Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haydar-c General rule: if it is smaller than or equal to the size of a pointer (64-bits) and you don't need to update it in the callee, pass by value. It avoids a pointer access to get it and hence will be faster (unless the compiler is clever enough to optimize out the pointer access, but I wouldn't count on that).

If it is bigger than 64-bit then pass by const ref if you don't need to modify it. Faster than a large copy.

If you need to modify it, no option but to pass by (non-const) reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vaughn!

@haydar-c
Copy link
Contributor Author

Thank you for detailed explanation @AlexandreSinger! I've updated.

Copy link
Contributor

@AlexandreSinger AlexandreSinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AlexandreSinger AlexandreSinger merged commit e425616 into master Jun 12, 2025
33 checks passed
@AlexandreSinger AlexandreSinger deleted the ap_high_fanout_net_thresholding branch June 12, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants