-
Notifications
You must be signed in to change notification settings - Fork 415
RR graph fix up #1448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RR graph fix up #1448
Conversation
…ave unique rr_nodes even when they have multiple locations on different sides
…de allocation for each pin
Hi @kmurray @vaughnbetz |
Hi all,
neuron_stratixiv_arch_timing.blif.tar.gz openCV_stratixiv_arch_timing.blif.tar.gz I also attached the vpr log files of the other 3 failures benchmarks here. |
Thanks a lot Xifan. I took a look through the logs, and it is interesting and encouraging. circuit overused_rr re-routed nets re-routed conn routing iteration The 3 new failures are clearly highly localized congestion. I imagine input or output pins (maybe OPINs, given the number of nets getting ripped up and rerouted).
|
There is more discussion of the rr-graph changes and their impact in #1355 |
Ideas: The DSP and RAM block model is believed to be overly restrictive right now; just was easier and never caused problems before. |
Hi @vaughnbetz
Here is the XML codes of the titan arch. related to the LAB output connection in the current master. It is a complete crossbar.
Adding the input crossbar will improve the routability but should we recalculate the DSP block area? These crossbars are very area consuming. Similar question for the RAM blocks. Regarding to the methodology, do you expect to apply all the modifications on my branch (in this pull request) or in separated branches and then merged together? |
@Bill-hbrhbr Adding Bill. Xifan, lets get Bill to make the overused rr-node print out feature. I don't want to overload you too much!
Based on the results of those, we could decide which architecture changes to commit (or to combine some and do an experiment 4). Methodology: Xifan, if you can make the architectures for exp 1 - 3 and run them on your branch, and also pass them to Helen to run on the master (or if it is easier you can also run them on the master) that would be great. I'd like to get QoR data for all those items and then we can decide what to change in the architecture. Helen has some new circuits that she's getting into the Titan flow, and at least one of those showed some strange routing to RAM blocks on the master so I think the lack of input pin equivalence on RAM blocks may already be an issue (maybe the current Titan designs don't have that many RAM blocks?). |
Also, experiment 0.5 (before 1 above) is to play with various options for the cluster input and output pin utilization limits.
|
Filed the reporting overused rr-node issue to Bill: #1453 |
Hi @vaughnbetz
|
Hi @vaughnbetz A spreadsheet that compares current QoR to the golden on master |
Hi @vaughnbetz
|
This looks great. full instance equivalence definitely looks like a good thing to turn on. |
Just looked through the 0.8 input and 0.8 output pin utilization target experiment. It didn't change much: +2% cluster count, +2% placer estimated critical path, +5% routed wirelength, +9% routed critical path delay, +446% route time (probably really really +300% since machine load looks higher due to place time). All comparisons to the master. |
I've created a spreadsheet that summarizes the 4 runs so far. LAB output equivalence with the fixed rr-graph is very close to quality parity with the master (with no rr-graph fix). Some metrics are slightly worse, some slightly better. I'll keep this spreadsheet up to date with future results so we can track combinations might be promising, but output equivalence definitely looks good. |
Hi @vaughnbetz I am trying to re-run the titan tests using the latest master with the rr_graph fix-up (no arch. modification, only rr_graph modification), so that we can report overused nodes. But I detect a bug in the flow-run script, where the max_route_iterations is reduced from 400 to 150. An issued has been created at #1458
|
…ue to rr_graph fix-up
…e Travis/Kokoro shows different QoR than the local machine. Should discuss if the patch is needed or not
@litghost I have fine-tuned the architecture
FYI, I attached the grid layout (from VPR GUI) after architecture modification here. |
Hi Xifan, Mixing clbs and RAMs in a column like this is hard to lay out, so it's not normally done in commercial FPGAs. I don't object to testing it in this architecture, but I think you should put a comment at the top of the file explaining what this architecture does that is different and what it is intended to test. (Nobody did that for the dedicated input/output pads that it appears to have, so that's good to list in that comment too). I don't think this architecture is used much in regtests and not in any automated QoR tests (which is good in my opinion, as with these changes it will be a bit more of an unusual architecture). Is that right? |
…shift from fix-up
….xml to clarify its unusual architecture choices.
@vaughnbetz I agree on this. It will challenge the physical design a lot. This is indeed rare in commercial FPGAs as well as any tape-out we have done. Usually, we properly size the grids so that there are no such EMPTY holes in the DSP/RAM columns. These holes are seen only during the architecture exploration. Therefore, I have added comments to the head of the architecture XML in order to clarify these points. If I miss anything, feel free to let me know. |
Thanks Xifan; the new comment is very clear. |
Hi all, I have address all the comments so far. The regression tests now passed. Please let me know if it is good to merge. |
Arch change looks fine to me |
Looks good. We can merge this now, but
I suspect we can merge in advance of 1 & 2 though but should do them ASAP after the merge. @tangxifan does that make sense? |
It will take ~1 day to finish and I think we can do it before merging. |
…e choice of pin equivalence
Thanks Xifan. I think it would be good to upload a spreadsheet of Titan QoR first, and then if all looks good, update the golden and merge. |
Hi @vaughnbetz |
Thanks Xifan! This looks great, and I don't think we need to do an update golden as everything was close enough anyway. I think this is ready to merge. Let me know if you disagree, otherwise I'll merge it. |
@vaughnbetz Thanks for the advice. I am nothing against it. |
Excellent -- glad to land this one! |
Fix-up the bug in the Routing Resource Graph which occurs when an input/output pin locates on multiple sides.
Description
Modify the rr_graph builder to
Related Issue
This pull request is a splitted pull request from #1355 to address the bug found in rr_graph
Motivation and Context
Current rr_graph builder allocates individual rr_node for the input/output pins locating on multiple sides.
This causes that router can map different nets to these rr_nodes which corresponds to the same pin.
In such case, the routing results are illegal.
This bug is observed mainly in the Stratrix4 architecture but not limited to it.
How Has This Been Tested?
Passed regression test with QoR failure. Expect to fix the architecture XML to meet QoR
Types of changes
Checklist: