-
Notifications
You must be signed in to change notification settings - Fork 415
Clock Modeling: Added two Stage router #928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Stage 1: route clock nets to a virtual sink. Virtual sink is connected to all clock network sources - Stage 2: Route all physical sinks of the clock net Note: after stage 1 we remove the virtual sink from the route-tree and traceback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mustafabbas its great to see this code almost ready to land!
I've provided detailed comments below but I've summarized the high level comments here:
-
Naming & Comments:
There are a variety of places where I think the names of functions/variables could be updated to clarify their purpose/meaning which I've included in the detailed comments below. Unless something is for a very specific purpose it should have a correspondingly precise name, so that no one thinks it can be used in general. -
Router Refactoring
It looks like you've refactored the router to extract out a new functiontiming_driven_route_sink_common()
. So it can be used in both the newtiming_driven_pre_route_sink()
and the existingtiming_driven_route_sink()
.
Since pre-routing is not routing an actual logical sink of the net, but just initializing the route tree to include the global network root, I think it would be better have timing_driven_pre_route()
not be 'net' aware, but just act to appropriately initialize the route tree. That is, timing_driven_pre_route()
should call one of the lower-level timing_driven_route_connection_from*()
functions, which are 'net un-aware' (only consider route trees and target RR nodes). After timing_driven_pre_route()
is called the resulting route tree would then be all setup to perform the actual sink routing for the net.
- Single Global Network/Global Network Allocation
The code seems to support only a single global network (despite VPR supporting the construction of multiple global networks), with only the root location of the last global network being remembered.
It also seems that if there is more than one global net in the netlist the code will cause unresolvable congestion since all global nets will try to use the single recorded global network root.
I think we need to:
a. Have a data structure which stores information about the set of global/clock network root nodes (so we can track the multiple root locations).
b. Have some kind of allocation stage in the router which tracks what networks are used by which global nets in the netlist (so we don't try to have multiple nets try to use the same root)
c. Gracefully degrade in the case we have more global nets in the netlist than global networks (the nets in excess should just be routed over the normal routing network).
- Regression Tests
We should have regression tests which cover at a minimum the new options introduced.
Since there are also complex interactions between some of these options, we should also have regression tests covering these. For instance covering cases like having more global nets than dedicated global networks.
- Documentation:
The VPR documentation should be updated to reflect the new command options, this also provides a good place to give a higher level overview of what these do.
vpr/src/route/route_timing.cpp
Outdated
return true; | ||
} | ||
|
||
static t_heap* timing_driven_route_sink_common( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it serves the same purpose as the existing timing_driven_route_connection_from_route_tree()
. Unless there is a very good reason to duplicate the code it seems like the existing routine should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done but I do think that now I have some repetition in my code. Though, I haven't been up to date on the routing code clean up so I am leaving the clean up in timing_driven_pre_route_clock_root with a "TODO" in the code for now until I learn more if I should spend time there.
Thanks for the detailed review Kevin. Mustafa, I think Kevin has identified the most likely culprit for the unresolvable congestion with more than one clock network above (a single global clock root remembered). Kevin, Mustafa's results do indeed show unresolvable congestion with 2 clock nets right now. |
Hi Kevin, Thank you for the thorough review!
The idea for the "virtual clock sink" was that one sink node connects to the roots of all clock networks in the graph. This would basically emulate what is being done for logical pin equivalence in logic blocks. The congestion I am seeing basically shows me that for every clock net route to the virtual clock network sink it tries to go through the same root of the clock and leaves the other one free (I have two clock network instances i.e. two roots). I can tell that the other root node goes on the heap but it's never chosen. In some cases when I increase the number of clock network instances (in this case to 5 instances for the stervision3 benchmark) it does route more than one clock net using different roots however the channel width grows considerably due to congestion at one of the root nodes.
I am okay with going this route especially if I cannot find a solution for congestion otherwise. If I understand correctly, this would basically mean adding a sink node for every clock root vs one sink that connects to all of them. |
What you describe should work Mustafa and is a good algorithm — let negotiated congestion choose the clock spine. Another possible cause of the congestion: we only rip up illegal or timing-degraded connections. This code may not flag the congestion on this first, special connection to the clock sink and rip it up. Instead perhaps only downstream routing is (uselessly?) ripped up. In that case the fix would be to make sure we rip up and reroute the special connection when necessary. |
…oring for two stage routing
5cd8d5a
to
6aded0d
Compare
All the mini comments have been addressed with the recent commit. I still need to add:
|
Thanks for the continuing work on this Mustafa. I've seen a bunch of commits; what's left to do? |
It looks like the Travis failures are are related to code formatting. See here for details on how to do that. |
The VTR nightly tests (presubmit) etc. are a strong CI feature that we don't have working yet, so those failures are expected (didn't run). So this looks good to merge if you fix the conflicts Mustafa, and I suggest you do that and merge now, and then fix the remaining documentation, test and rr_graph read-in features in a separate pull request. |
Merging now. Still missing that which is noted in #928 (comment) |
@mustafabbas How can I try the two stage router? This fails (I chose the smallest benchmark from
with
|
Nevermind. This works: |
That's great! I'll update the documentation soon |
Dusty is interested in taking over the final work on the clock routing. Dusty, please see Chapter 4 of Mustafa's MASc thesis (https://tspace.library.utoronto.ca/bitstream/1807/97807/3/Abbas_Mustafa_S_201911_MSc_thesis.pdf) for information on how the two-stage clock router works. The background in Section 2.1.3. may also be helpful in understanding the context. |
Added two stage Router
Description
Virtual sink is connected to all clock network sources
Note: after stage 1 we remove the virtual sink
from the route-tree and traceback
Related Issue
#521
#520
Motivation and Context
Ensures that clock nets use the clock network instead of local routing
How Has This Been Tested?
vpr_qor test run successfully with option turned on
Checklist: