-
Notifications
You must be signed in to change notification settings - Fork 415
Noc cost normalization #2485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noc cost normalization #2485
Conversation
Initial NoC placement PR, which is still open, has changed NoC code. To avoid conflicts in the future, this merge was required.
I moved NocDeltaCost declaration from noc_place_utils.h to place_util.h to resolve a cyclic dependency. Forward declaration of NocDeltaCost and t_placer_costs did not solve the problem as the compiler complained about GridTileLookup.
…, and free_noc_placement_structs() for NoC congestion costs
Added some comments to noc_link.h to explain what each method does.
Some NoC tests were failing due to newly added code for congestion modeling. This commit hopefully fixes them.
…it_noc_costs, test_recompute_noc_costs to check congestion
…routes after revert
When I pass rr graph and router lookahead files to VPR, it throws an error. capnproto uses mmap to open these files. It seems that multiple processes can access a single file using mmap. However, I cannot trust capnproto. The changes in this commit enhace the vtr task syntax by allowing copying arbitrary files to the temporay directory. This way, I can copy rr graph file and prevent multiple processes accessing the same file.
The previous commits did not work. It seems that capnproto uses PWD environment variable instead of calling getcwd(). popen method changes the working directory, but does not update PWD. I update it manually.
This reverts commit 2d3e642.
I ran a parameter sweep over NoC weighting factors for aggregate latency, latency overrun, and aggregate banwidth. The sum of weighting factors is always equal to 1. In the following table, aggregate bandwidth weighting factor can be determined by subtracting latency and latency overrun factors from 1. Parameter sweep tables show the results for when the NoC placement weighting factor is set to 5. I also ran experiments with this factor set to 10 and 1, but did not include them here as the trend was similar across different NoC placement weighting factors. Number of met latency constraints
The table above shows that when the latency constraint weighting factor is zero, the placer does not care about latency constraints. However, setting this factor to very large numbers does not increase the number of met latency constraints. The latency constraint weighting factor should be greater than aggregate latency factor so that the placer prioritizes meeting latency constraints over minimizing the aggregate latency. Another interesing observation is that the latency constraint factor does not need to be significantly larger than the aggregate latency factor. In the master branch the latency constraint factor is 20x greater than the aggregate latency factor. This significant difference is needed because non-normalized latency cost terms were added together, and latency overrun was sometimes very smaller than the aggregate latency. Therefore, to prioritize meeting latency constraints to reducing aggregate latency, the latency constraint factor should have been set to a much larger value. In this PR, aggregate latency and latency overrun are normalized separately. As a result, the latency constraint factor does not need to be considerably larger than the aggregate latency factor to prioritize meeting constraints over minimizing the aggregate latency. |
Aggregate Latency
As the table above shows, aggragate latency is not sensitive to the aggregate latency weighting factor as long as the aggregate bandwidth factor is non-zero. This is because minimizing the aggregate bandwidth requires placing traffic flow endpoints close to each other, which indirectly optimizes the aggregate latency at the same time. |
Aggregate Bandwidth
Although optimizing the aggregate bandwith minimizes aggregate latency, reducing latency does not minimize the aggregate bandwidth. As can be seen in the table above, the aggregate bandwith grows large when the aggregate bandwith weighting factor is zero. Increasing the aggregate latency weighting factor cannot improve the aggregate bandwith when its corresponding weighting factor is set to zero. This is because each traffic flow might have different bandwidths. For example assume there are 9 logical routers in a netlist where a central router send data to other 8 routers. The bandwidth of 4 traffic flows are higher than others. The aggregate latency can be minimized be placing the central router in the middle and surrounding it with other routers. When the aggregate bandwidth weighting factor is zero, the placer neglects traffic flow bandwidths and traffic flows with higher bandwidths may travel multiple hops. When the aggregate bandwidth weighting factor is non-zero, routers which are connected through high bandwidth traffic flows are placed more closely. |
Comparison with master
This PR computes and keeps track of NoC congestion cost. The increase in runtime can be partly attributed to more comlpex cost computation for NoC swaps. |
QoR looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a few commenting changes and examination of one O(Nlinks) loop requested.
@vaughnbetz |
Thanks. It looks good and I'm merging it. @soheilshahrouz : I think the documentation (.rst) file for the command line options also needs an update to document the new / changed command line options. The description is pretty much what is in the help for the arg parser. Can you make that update in another PR? |
Description
This PR seprates NoC cost term renormalization from computation. Before this PR, a weighted average of aggregate latency and latnecy overrun was normalized. This PR separates all cost terms and computes one normalization factor for each one.
This PR also includes code to compute NoC congestion, but currently sets its cost to 0 (don't optimize yet).
Related Issue
Motivation and Context
To keep the way NoC normailzation factors are computed consistent with bb and timing cost normalization factors.
How Has This Been Tested?
A parameter sweep was run to find the best combination of weighting factors. The results are compared with the master branch.
Types of changes
Checklist: