Skip to content

Speed up Router's calculation of OveruseInfo #1376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 30, 2020

Conversation

Bill-hbrhbr
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Jun 23, 2020

Motivation and Context

Trying to speed up the VPR router by detecting its various performance bottlenecks and remove them by optimizations. This PR should not change any of the VPR router's functionalities.

This PR is still undergoing changes. I will update more commits if I manage to optimize other areas of the code.

What Has Already Been Optimized?

  1. Optimized overused info calculation. Noticeable speed up when running very large titan benchmarks, where the number of nets is large and many router iterations are performed.
  • Tested on the LU8+Flagship and Stereovision+Tian benchmarks. No significant speedup.
  • Tested on the Bitcoin+Titan benchmark. Appreciable speedup.

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Optimization (change which speeds up code)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

… been rerouted, since they should not contribute to the total number of overused nodes
@Bill-hbrhbr Bill-hbrhbr self-assigned this Jun 23, 2020
@probot-autolabeler probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Jun 23, 2020
@vaughnbetz
Copy link
Contributor

vaughnbetz commented Jun 26, 2020

Thanks Bill. The code looks good.

  1. Please attach QoR data on the vtr and Titan benchmarks (see the developer guide in the documentation for how to get that, and ask Mohamed for anything that's unclear.)

(Idea that won't work below; looked closer and pres_cost is also updated, which will need to be done for all nodes. To avoid that we'd have to compute pres_cost on the fly (from occupancy and pres_fac, so probably not worthwhile as the possible gains are reduced and the code churn is increased).
2. The same technique might be possible (with extensions) in

void pathfinder_update_cost(float pres_fac, float acc_fac) {
(pathfinder_update_cost)
Right now that routine goes through all routing resource nodes. If only a few nets (or even better, connections) are rerouted, you could instead go through only the nets that were rerouted, but you'd need to update the costs for all nodes in either the old routing of those nets or the new routings. If more than a few nets are rerouted then just going through all rr_nodes is probably faster though.

@vaughnbetz
Copy link
Contributor

I was thinking more about #2 above and I think it is worth investigating. It would require:

  • remove pres_cost from rr_node_route_inf. Instead compute it on the fly using the same code as in pathfinder_update_cost
  • Now pres_cost will always be up to date based on occupancy.
  • Change pathfinder_update_cost to only go through the congested nets and update acc_cost

Nowadays bringing more data in from storage is usually more expensive than doing a small amount of computation, so shrinking rr_node_route_inf with this on-the-fly computation may itself be a good idea (or at least not a very bad idea). And that refactoring allows pathfinder_update_cost to become incremental in the same way as you made the overused_info computation.

This should all be done in any issue / PR though -- once you have the QoR data for this PR you can merge and close it.

@kmurray
Copy link
Contributor

kmurray commented Jun 29, 2020

This should all be done in any issue / PR though -- once you have the QoR data for this PR you can merge and close it.

I agree, getting QoR data for each change individually, and merging them as separate PRs seems like the right approach.

@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Jun 29, 2020

shrinking rr_node_route_inf with this on-the-fly computation may itself be a good idea

@vaughnbetz After some investigations, I found out that pres_cost is referenced in many other places, so it might not be possible to eliminate this field entirely (probably requires a lot of refactorizations). But it does seem to me that there is an abundance of updates to this field, and some of them are probably unnecessary. What I think I'll try to do is to pick out all the seemingly unnecessary ones and comment them out to see how that will affect the results.

@Bill-hbrhbr
Copy link
Contributor Author

@Bill-hbrhbr Bill-hbrhbr changed the title Optimize VPR Router Performance Speed up Router's calculation of OveruseInfo Jun 30, 2020
@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Jun 30, 2020

@vaughnbetz
Copy link
Contributor

QoR data is looking good so far; looks like a 3% speedup or so.
For pres_cost: yes, you'd need to refactor. I'd write an inline function (well inline is a hint, but may help) in a header file to create pres_cost on the fly. Then every use of pres_cost would become a call to it, and every update of pres_cost should be deleted.

@Bill-hbrhbr Bill-hbrhbr merged commit f537df2 into verilog-to-routing:master Jun 30, 2020
@Bill-hbrhbr Bill-hbrhbr deleted the router_optimize branch June 30, 2020 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants