Skip to content

Clock Modeling: Added two Stage router #928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 6, 2019
Merged

Conversation

mustafabbas
Copy link
Member

@mustafabbas mustafabbas commented Aug 16, 2019

Added two stage Router

Description

  • Stage 1: route clock nets to a virtual sink.
    Virtual sink is connected to all clock network sources
  • Stage 2: Route all physical sinks of the clock net

Note: after stage 1 we remove the virtual sink
from the route-tree and traceback

Related Issue

#521
#520

Motivation and Context

Ensures that clock nets use the clock network instead of local routing

How Has This Been Tested?

vpr_qor test run successfully with option turned on

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All existing tests passed

- Stage 1: route clock nets to a virtual sink.
  Virtual sink is connected to all clock network sources
- Stage 2: Route all physical sinks of the clock net

Note: after stage 1 we remove the virtual sink
      from the route-tree and traceback
@probot-autolabeler probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures) labels Aug 16, 2019
Copy link
Contributor

@kmurray kmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mustafabbas its great to see this code almost ready to land!

I've provided detailed comments below but I've summarized the high level comments here:

  1. Naming & Comments:
    There are a variety of places where I think the names of functions/variables could be updated to clarify their purpose/meaning which I've included in the detailed comments below. Unless something is for a very specific purpose it should have a correspondingly precise name, so that no one thinks it can be used in general.

  2. Router Refactoring
    It looks like you've refactored the router to extract out a new function timing_driven_route_sink_common(). So it can be used in both the new timing_driven_pre_route_sink() and the existing timing_driven_route_sink().

Since pre-routing is not routing an actual logical sink of the net, but just initializing the route tree to include the global network root, I think it would be better have timing_driven_pre_route() not be 'net' aware, but just act to appropriately initialize the route tree. That is, timing_driven_pre_route() should call one of the lower-level timing_driven_route_connection_from*() functions, which are 'net un-aware' (only consider route trees and target RR nodes). After timing_driven_pre_route() is called the resulting route tree would then be all setup to perform the actual sink routing for the net.

  1. Single Global Network/Global Network Allocation
    The code seems to support only a single global network (despite VPR supporting the construction of multiple global networks), with only the root location of the last global network being remembered.

It also seems that if there is more than one global net in the netlist the code will cause unresolvable congestion since all global nets will try to use the single recorded global network root.

I think we need to:
a. Have a data structure which stores information about the set of global/clock network root nodes (so we can track the multiple root locations).
b. Have some kind of allocation stage in the router which tracks what networks are used by which global nets in the netlist (so we don't try to have multiple nets try to use the same root)
c. Gracefully degrade in the case we have more global nets in the netlist than global networks (the nets in excess should just be routed over the normal routing network).

  1. Regression Tests
    We should have regression tests which cover at a minimum the new options introduced.

Since there are also complex interactions between some of these options, we should also have regression tests covering these. For instance covering cases like having more global nets than dedicated global networks.

  1. Documentation:
    The VPR documentation should be updated to reflect the new command options, this also provides a good place to give a higher level overview of what these do.

return true;
}

static t_heap* timing_driven_route_sink_common(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it serves the same purpose as the existing timing_driven_route_connection_from_route_tree(). Unless there is a very good reason to duplicate the code it seems like the existing routine should be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but I do think that now I have some repetition in my code. Though, I haven't been up to date on the routing code clean up so I am leaving the clean up in timing_driven_pre_route_clock_root with a "TODO" in the code for now until I learn more if I should spend time there.

@vaughnbetz
Copy link
Contributor

Thanks for the detailed review Kevin. Mustafa, I think Kevin has identified the most likely culprit for the unresolvable congestion with more than one clock network above (a single global clock root remembered). Kevin, Mustafa's results do indeed show unresolvable congestion with 2 clock nets right now.

@mustafabbas
Copy link
Member Author

mustafabbas commented Aug 16, 2019

Hi Kevin,

Thank you for the thorough review!

1. Single Global Network/Global Network Allocation
   The code seems to support only a single global network (despite VPR supporting the construction of multiple global networks), with only the root location of the last global network being remembered.

The idea for the "virtual clock sink" was that one sink node connects to the roots of all clock networks in the graph. This would basically emulate what is being done for logical pin equivalence in logic blocks.
As I loop through all clock network roots I connect the sink node to them here:
https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/928/files#diff-912073d96371744367fa02731c97df85R80-R83

The congestion I am seeing basically shows me that for every clock net route to the virtual clock network sink it tries to go through the same root of the clock and leaves the other one free (I have two clock network instances i.e. two roots). I can tell that the other root node goes on the heap but it's never chosen. In some cases when I increase the number of clock network instances (in this case to 5 instances for the stervision3 benchmark) it does route more than one clock net using different roots however the channel width grows considerably due to congestion at one of the root nodes.

I think we need to:
a. Have a data structure which stores information about the set of global/clock network root nodes (so we can track the multiple root locations).
b. Have some kind of allocation stage in the router which tracks what networks are used by which global nets in the netlist (so we don't try to have multiple nets try to use the same root)
c. Gracefully degrade in the case we have more global nets in the netlist than global networks (the nets in excess should just be routed over the normal routing network).

I am okay with going this route especially if I cannot find a solution for congestion otherwise. If I understand correctly, this would basically mean adding a sink node for every clock root vs one sink that connects to all of them.

@vaughnbetz
Copy link
Contributor

What you describe should work Mustafa and is a good algorithm — let negotiated congestion choose the clock spine.

Another possible cause of the congestion: we only rip up illegal or timing-degraded connections. This code may not flag the congestion on this first, special connection to the clock sink and rip it up. Instead perhaps only downstream routing is (uselessly?) ripped up. In that case the fix would be to make sure we rip up and reroute the special connection when necessary.

@mustafabbas
Copy link
Member Author

All the mini comments have been addressed with the recent commit. I still need to add:

  • Tests: Using sterovision3, a two clock circuit
  1. add a test where the architecture only has one clock network -> ensure that the two stage routing only tries connecting one clock net to the dedicated network. (need to add code to check if the occupancy limit has been reached for the nodes connecting to the the virtual clock sink, if so don't try two stage routing for this net)
  2. add a test where the architecture has two clock networks -> ensure that both clock nets choose the clock network. (good to go)
  • Documentation on option: --two_stage_clock_routing
  • Reading a virtual clock sink from the xml rr_graph

@vaughnbetz
Copy link
Contributor

Thanks for the continuing work on this Mustafa. I've seen a bunch of commits; what's left to do?

@kmurray
Copy link
Contributor

kmurray commented Dec 5, 2019

It looks like the Travis failures are are related to code formatting. See here for details on how to do that.

@vaughnbetz
Copy link
Contributor

The VTR nightly tests (presubmit) etc. are a strong CI feature that we don't have working yet, so those failures are expected (didn't run). So this looks good to merge if you fix the conflicts Mustafa, and I suggest you do that and merge now, and then fix the remaining documentation, test and rr_graph read-in features in a separate pull request.

@mustafabbas
Copy link
Member Author

Merging now. Still missing that which is noted in #928 (comment)

@mustafabbas mustafabbas merged commit 7ffd6d4 into master Dec 6, 2019
@mustafabbas mustafabbas deleted the two_stage_routing branch December 6, 2019 03:55
@HackerFoo
Copy link
Contributor

@mustafabbas How can I try the two stage router?

This fails (I chose the smallest benchmark from vtr_flow/tasks/timing_chain/config/config.txt because it uses the same architecture as in your thesis) :

$ (cd vtr_flow && ./scripts/run_vtr_flow.pl benchmarks/verilog/diffeq1.v arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml --two_stage_clock_routing --clock_modeling dedicated_network)
k6_frac_N10_frac_chain_mem32K_40nm/diffeq1                                                                               failed: vpr (exited with return code 134) (took 1.34 seconds)

with

vtr-verilog-to-routing/vpr/src/route/rr_graph_clock.cpp:76 add_rr_switches_and_map_to_nodes: Assertion 'rr_nodes.size() > node_start_idx' failed.

@HackerFoo
Copy link
Contributor

Nevermind. This works: (cd vtr_flow && ./scripts/run_vtr_flow.pl benchmarks/verilog/diffeq1.v arch/timing/k6_frac_N10_frac_chain_mem32K_htree0_40nm.xml --two_stage_clock_routing --clock_modeling dedicated_network)

@mustafabbas
Copy link
Member Author

That's great! I'll update the documentation soon

@vaughnbetz
Copy link
Contributor

Dusty is interested in taking over the final work on the clock routing.
So I propose:
Mustafa: focus on documentation
Dusty: add support for reading a virtual clock sink from the xml rr_graph
Dusty (time permitting): add the additional tests listed above.

Dusty, please see Chapter 4 of Mustafa's MASc thesis (https://tspace.library.utoronto.ca/bitstream/1807/97807/3/Abbas_Mustafa_S_201911_MSc_thesis.pdf) for information on how the two-stage clock router works. The background in Section 2.1.3. may also be helpful in understanding the context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants