Clock Modeling: Added two Stage router #928

mustafabbas · 2019-08-16T14:43:49Z

Added two stage Router

Description

Stage 1: route clock nets to a virtual sink.
Virtual sink is connected to all clock network sources
Stage 2: Route all physical sinks of the clock net

Note: after stage 1 we remove the virtual sink
from the route-tree and traceback

Related Issue

#521
#520

Motivation and Context

Ensures that clock nets use the clock network instead of local routing

How Has This Been Tested?

vpr_qor test run successfully with option turned on

Checklist:

My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All existing tests passed

- Stage 1: route clock nets to a virtual sink. Virtual sink is connected to all clock network sources - Stage 2: Route all physical sinks of the clock net Note: after stage 1 we remove the virtual sink from the route-tree and traceback

kmurray

Thanks @mustafabbas its great to see this code almost ready to land!

I've provided detailed comments below but I've summarized the high level comments here:

Naming & Comments:
There are a variety of places where I think the names of functions/variables could be updated to clarify their purpose/meaning which I've included in the detailed comments below. Unless something is for a very specific purpose it should have a correspondingly precise name, so that no one thinks it can be used in general.
Router Refactoring
It looks like you've refactored the router to extract out a new function timing_driven_route_sink_common(). So it can be used in both the new timing_driven_pre_route_sink() and the existing timing_driven_route_sink().

Since pre-routing is not routing an actual logical sink of the net, but just initializing the route tree to include the global network root, I think it would be better have timing_driven_pre_route() not be 'net' aware, but just act to appropriately initialize the route tree. That is, timing_driven_pre_route() should call one of the lower-level timing_driven_route_connection_from*() functions, which are 'net un-aware' (only consider route trees and target RR nodes). After timing_driven_pre_route() is called the resulting route tree would then be all setup to perform the actual sink routing for the net.

Single Global Network/Global Network Allocation
The code seems to support only a single global network (despite VPR supporting the construction of multiple global networks), with only the root location of the last global network being remembered.

It also seems that if there is more than one global net in the netlist the code will cause unresolvable congestion since all global nets will try to use the single recorded global network root.

I think we need to:
a. Have a data structure which stores information about the set of global/clock network root nodes (so we can track the multiple root locations).
b. Have some kind of allocation stage in the router which tracks what networks are used by which global nets in the netlist (so we don't try to have multiple nets try to use the same root)
c. Gracefully degrade in the case we have more global nets in the netlist than global networks (the nets in excess should just be routed over the normal routing network).

Regression Tests
We should have regression tests which cover at a minimum the new options introduced.

Since there are also complex interactions between some of these options, we should also have regression tests covering these. For instance covering cases like having more global nets than dedicated global networks.

Documentation:
The VPR documentation should be updated to reflect the new command options, this also provides a good place to give a higher level overview of what these do.

vpr/src/base/read_options.cpp

vpr/src/base/vpr_context.h

vpr/src/route/clock_connection_builders.cpp

vpr/src/route/route_common.cpp

kmurray · 2019-08-16T15:32:39Z

vpr/src/route/route_timing.cpp

+    return true;
+}
+
+static t_heap* timing_driven_route_sink_common(


This looks like it serves the same purpose as the existing timing_driven_route_connection_from_route_tree(). Unless there is a very good reason to duplicate the code it seems like the existing routine should be used.

Done but I do think that now I have some repetition in my code. Though, I haven't been up to date on the routing code clean up so I am leaving the clean up in timing_driven_pre_route_clock_root with a "TODO" in the code for now until I learn more if I should spend time there.

vpr/src/route/route_timing.cpp

vaughnbetz · 2019-08-16T16:18:36Z

Thanks for the detailed review Kevin. Mustafa, I think Kevin has identified the most likely culprit for the unresolvable congestion with more than one clock network above (a single global clock root remembered). Kevin, Mustafa's results do indeed show unresolvable congestion with 2 clock nets right now.

mustafabbas · 2019-08-16T20:42:41Z

Hi Kevin,

Thank you for the thorough review!

1. Single Global Network/Global Network Allocation
   The code seems to support only a single global network (despite VPR supporting the construction of multiple global networks), with only the root location of the last global network being remembered.

The idea for the "virtual clock sink" was that one sink node connects to the roots of all clock networks in the graph. This would basically emulate what is being done for logical pin equivalence in logic blocks.
As I loop through all clock network roots I connect the sink node to them here:
https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/928/files#diff-912073d96371744367fa02731c97df85R80-R83

The congestion I am seeing basically shows me that for every clock net route to the virtual clock network sink it tries to go through the same root of the clock and leaves the other one free (I have two clock network instances i.e. two roots). I can tell that the other root node goes on the heap but it's never chosen. In some cases when I increase the number of clock network instances (in this case to 5 instances for the stervision3 benchmark) it does route more than one clock net using different roots however the channel width grows considerably due to congestion at one of the root nodes.

I think we need to:
a. Have a data structure which stores information about the set of global/clock network root nodes (so we can track the multiple root locations).
b. Have some kind of allocation stage in the router which tracks what networks are used by which global nets in the netlist (so we don't try to have multiple nets try to use the same root)
c. Gracefully degrade in the case we have more global nets in the netlist than global networks (the nets in excess should just be routed over the normal routing network).

I am okay with going this route especially if I cannot find a solution for congestion otherwise. If I understand correctly, this would basically mean adding a sink node for every clock root vs one sink that connects to all of them.

vaughnbetz · 2019-08-16T21:57:06Z

What you describe should work Mustafa and is a good algorithm — let negotiated congestion choose the clock spine.

Another possible cause of the congestion: we only rip up illegal or timing-degraded connections. This code may not flag the congestion on this first, special connection to the clock sink and rip it up. Instead perhaps only downstream routing is (uselessly?) ripped up. In that case the fix would be to make sure we rip up and reroute the special connection when necessary.

…oring for two stage routing

mustafabbas · 2019-11-19T10:10:35Z

All the mini comments have been addressed with the recent commit. I still need to add:

Tests: Using sterovision3, a two clock circuit

add a test where the architecture only has one clock network -> ensure that the two stage routing only tries connecting one clock net to the dedicated network. (need to add code to check if the occupancy limit has been reached for the nodes connecting to the the virtual clock sink, if so don't try two stage routing for this net)
add a test where the architecture has two clock networks -> ensure that both clock nets choose the clock network. (good to go)

Documentation on option: --two_stage_clock_routing
Reading a virtual clock sink from the xml rr_graph

vaughnbetz · 2019-11-28T20:11:36Z

Thanks for the continuing work on this Mustafa. I've seen a bunch of commits; what's left to do?

kmurray · 2019-12-05T22:35:46Z

It looks like the Travis failures are are related to code formatting. See here for details on how to do that.

vaughnbetz · 2019-12-05T22:40:12Z

The VTR nightly tests (presubmit) etc. are a strong CI feature that we don't have working yet, so those failures are expected (didn't run). So this looks good to merge if you fix the conflicts Mustafa, and I suggest you do that and merge now, and then fix the remaining documentation, test and rr_graph read-in features in a separate pull request.

mustafabbas · 2019-12-06T03:55:00Z

Merging now. Still missing that which is noted in #928 (comment)

HackerFoo · 2020-01-08T00:35:05Z

@mustafabbas How can I try the two stage router?

This fails (I chose the smallest benchmark from vtr_flow/tasks/timing_chain/config/config.txt because it uses the same architecture as in your thesis) :

$ (cd vtr_flow && ./scripts/run_vtr_flow.pl benchmarks/verilog/diffeq1.v arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml --two_stage_clock_routing --clock_modeling dedicated_network)
k6_frac_N10_frac_chain_mem32K_40nm/diffeq1                                                                               failed: vpr (exited with return code 134) (took 1.34 seconds)

with

vtr-verilog-to-routing/vpr/src/route/rr_graph_clock.cpp:76 add_rr_switches_and_map_to_nodes: Assertion 'rr_nodes.size() > node_start_idx' failed.

HackerFoo · 2020-01-08T01:01:18Z

Nevermind. This works: (cd vtr_flow && ./scripts/run_vtr_flow.pl benchmarks/verilog/diffeq1.v arch/timing/k6_frac_N10_frac_chain_mem32K_htree0_40nm.xml --two_stage_clock_routing --clock_modeling dedicated_network)

mustafabbas · 2020-01-08T01:06:02Z

That's great! I'll update the documentation soon

vaughnbetz · 2020-01-13T16:22:15Z

Dusty is interested in taking over the final work on the clock routing.
So I propose:
Mustafa: focus on documentation
Dusty: add support for reading a virtual clock sink from the xml rr_graph
Dusty (time permitting): add the additional tests listed above.

Dusty, please see Chapter 4 of Mustafa's MASc thesis (https://tspace.library.utoronto.ca/bitstream/1807/97807/3/Abbas_Mustafa_S_201911_MSc_thesis.pdf) for information on how the two-stage clock router works. The background in Section 2.1.3. may also be helpful in understanding the context.

Clock Modeling: Added two Stage router

62797c4

- Stage 1: route clock nets to a virtual sink. Virtual sink is connected to all clock network sources - Stage 2: Route all physical sinks of the clock net Note: after stage 1 we remove the virtual sink from the route-tree and traceback

probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures) labels Aug 16, 2019

mustafabbas requested review from vaughnbetz and kmurray August 16, 2019 14:44

kmurray reviewed Aug 16, 2019

View reviewed changes

This was referenced Nov 14, 2019

Clock signals routed through INT tiles f4pga/f4pga-arch-defs#1153

Closed

Add OSERDES support f4pga/f4pga-arch-defs#1102

Merged

Assigning pin criticality to virtual clock sink + Renaming and refact…

6aded0d

…oring for two stage routing

mustafabbas force-pushed the two_stage_routing branch from 5cd8d5a to 6aded0d Compare November 19, 2019 09:40

Merge remote-tracking branch 'origin' into two_stage_routing

e3539e2

Conforming to formating standard for clock router

ba55efb

mustafabbas merged commit 7ffd6d4 into master Dec 6, 2019

mustafabbas deleted the two_stage_routing branch December 6, 2019 03:55

This was referenced Feb 3, 2020

Adding documentation for two stage clock routing cmd line option #1103

Merged

Docs: Adding clock architecture documentation #1104

Merged

Clock Modeling: Added two Stage router #928

Clock Modeling: Added two Stage router #928

Uh oh!

Conversation

mustafabbas commented Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Checklist:

Uh oh!

kmurray left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kmurray Aug 16, 2019

Choose a reason for hiding this comment

Uh oh!

mustafabbas Nov 19, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vaughnbetz commented Aug 16, 2019

Uh oh!

mustafabbas commented Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaughnbetz commented Aug 16, 2019

Uh oh!

mustafabbas commented Nov 19, 2019

Uh oh!

vaughnbetz commented Nov 28, 2019

Uh oh!

kmurray commented Dec 5, 2019

Uh oh!

vaughnbetz commented Dec 5, 2019

Uh oh!

mustafabbas commented Dec 6, 2019

Uh oh!

HackerFoo commented Jan 8, 2020

Uh oh!

HackerFoo commented Jan 8, 2020

Uh oh!

mustafabbas commented Jan 8, 2020

Uh oh!

vaughnbetz commented Jan 13, 2020

Uh oh!

Uh oh!

mustafabbas commented Aug 16, 2019 •

edited

Loading

kmurray left a comment •

edited

Loading

mustafabbas commented Aug 16, 2019 •

edited

Loading