Wire length metric incorrectly terminates routing iterations #524

litghost · 2019-04-02T00:47:04Z

Expected Behaviour

Routing should complete

Current Behaviour

Confirming router algorithm: TIMING_DRIVEN.
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
   1    0.0     0.0    0    2907      14      27      28 ( 0.887%)   10178 (110.8%) 61817500000000.000 -6.182e+13 -61817500000000.000      0.000      0.000      N/A
Wire length usage ratio 1.10823 exceeds limit of 0.85, fail routing.
Routing failed.

Possible Solution

Disabling the wire length usage ratio results in a valid routing solution very quickly:

Confirming router algorithm: TIMING_DRIVEN.
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
   1    0.0     0.0    0    2907      14      27      28 ( 0.887%)   10178 (110.8%) 61817500000000.000 -6.182e+13 -61817500000000.000      0.000      0.000      N/A
Wire length usage ratio 1.10823 exceeds limit of 0.85, fail routing.
   2    0.1     0.5   10  286487      14      27      56 ( 1.774%)    4172 (45.4%) 49139500000000.000 -4.914e+13 -49139500000000.000      0.000      0.000      N/A
   3    0.0     0.6    3  124644       3       7       0 ( 0.000%)    2170 (23.6%) 32566500000000.000 -3.257e+13 -32566500000000.000      0.000      0.000      N/A

Steps to Reproduce

Unzip vpr_bug2.zip
Run:

vpr \
  arch.unique_pack.xml \
  top.eblif \
  --device 10x10 --read_rr_graph rr_graph_10x10_dummy.rr_graph.real.xml \
  --min_route_chan_width_hint 100 --max_criticality 0.0 --max_router_iterations 500 \
  --routing_failure_predictor off --router_high_fanout_threshold -1 --constant_net_method route \ 
  --route_chan_width 100 --pack --place --route

The text was updated successfully, but these errors were encountered:

kmurray · 2019-04-02T14:33:10Z

I was able to reproduce this with:

vpr arch.unique_pack.xml top.eblif --device 10x10 --read_rr_graph rr_graph_10x10_dummy.rr_graph.real.xml --constant_net_method route --route_chan_width 100 --timing_analysis off

on master.

The wirelength early abort threshold is there to avoid spending large amounts of CPU time when routing is far to congested to finish -- this is generally a good thing, and not something we want to remove in general.

What's happening in this case is that on the first iteration all signals end-up on the dedicated routing network (since it's only a single 'hop' between source and destination). But since the dedicated routing network consists of shorted-together RR nodes which span the entire chip, each net ends up using a huge amount of wiring which causes the wirelength abort threshold to be hit.

A short-term work-around is to make the wirelength abort threshold a command-line option so it can be adjusted if it causes issues on architectures like this.

A longer-term fix is to modify how the router internally costs non-configurably connected nodes. Currently a set of non-configurably connected nodes are costed as equivalent to the minimum-cost across all the nodes in the set. This is what you may want from a timing perspective (i.e. it's good to use the non-configurably connected set of nodes if one leads to a faster path). However for large sets of nodes (like in this case) it drastically under-estimates the resource cost (e.g. wirelength) of using them.

litghost · 2019-04-02T14:55:14Z

What's happening in this case is that on the first iteration all signals end-up on the dedicated routing network (since it's only a single 'hop' between source and destination). But since the dedicated routing network consists of shorted-together RR nodes which span the entire chip, each net ends up using a huge amount of wiring which causes the wirelength abort threshold to be hit.

I kinda figured it would be something like this.

A short-term work-around is to make the wirelength abort threshold a command-line option so it can be adjusted if it causes issues on architectures like this.

Agreed.

A longer-term fix is to modify how the router internally costs non-configurably connected nodes. Currently a set of non-configurably connected nodes are costed as equivalent to the minimum-cost across all the nodes in the set. This is what you may want from a timing perspective (i.e. it's good to use the non-configurably connected set of nodes if one leads to a faster path). However for large sets of nodes (like in this case) it drastically under-estimates the resource cost (e.g. wirelength) of using them.

Wouldn't this penalize the global network? That might be a bad thing in the case of a global network. I think it would more accurately characterize the solution is to have the router avoid the global network if the source of net cannot reach the global network?

kmurray · 2019-04-02T15:12:09Z

Wouldn't this penalize the global network? That might be a bad thing in the case of a global network. I think it would more accurately characterize the solution is to have the router avoid the global network if the source of net cannot reach the global network?

It's somewhat a matter of perspective. I agree that more carefully controlling access to the global network (e.g. #521) would probably help this specific case.

However in general the router should truly understand the cost of using a dedicated network. For example in this architecture, choosing to use it means it uses a lot of wire, and it may be better (e.g. for a low fan-out signal) to use the regular wiring.

I've filed #525 to log that issue.

kmurray · 2019-04-02T15:13:18Z

93fc414 gives a command-line option to control the wirelength abort threshold as a work-around.

kmurray · 2019-05-21T18:03:15Z

Should be fixed as of 616833e which fixes the costing of non-configurably connected routing.

Previously I would get:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
   1    0.0     0.0    0    4505      14      27      57 ( 1.806%)    9436 (102.7%)      N/A        N/A        N/A        N/A        N/A      N/A
   2    6.7     0.5   13  169364      13      26      56 ( 1.774%)    3360 (36.6%)      N/A        N/A        N/A        N/A        N/A      N/A
   3    0.0     0.6    4   38144       2       4       0 ( 0.000%)    2002 (21.8%)      N/A        N/A        N/A        N/A        N/A      N/A
Restoring best routing
Successfully routed after 3 routing iterations.

Where the first iteration use > 100% of available wire since most signals try to use the global network.

With the latest code I now get:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
   1    0.0     0.0    0  190827      14      27       8 ( 0.253%)     658 ( 7.2%)      N/A        N/A        N/A        N/A        N/A      N/A
   2    6.1     0.5    0  154740      12      25       0 ( 0.000%)     658 ( 7.2%)      N/A        N/A        N/A        N/A        N/A      N/A

Restoring best routing
Successfully routed after 2 routing iterations.

Where most signals avoid the global network since it is now costed in proportion to its wirelength.

mithro · 2019-05-21T19:26:47Z

@acomodi + @litghost - Can you confirm this fixes things in our use case?

litghost mentioned this issue Apr 2, 2019

Implement global constant routing network in testarch. f4pga/f4pga-arch-defs#518

Merged

kmurray mentioned this issue Apr 2, 2019

Improve Router Costing of Non-configurably Connected Nodes #525

Closed

kmurray closed this as completed May 21, 2019

litghost mentioned this issue May 28, 2019

Recent router changes makes SymbiFlow archs fail routing SymbiFlow/vtr-verilog-to-routing#61

Closed

acomodi mentioned this issue May 28, 2019

VPR cannot find a possible route #614

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wire length metric incorrectly terminates routing iterations #524

Wire length metric incorrectly terminates routing iterations #524

litghost commented Apr 2, 2019 •

edited

Loading

kmurray commented Apr 2, 2019

Uh oh!

litghost commented Apr 2, 2019

Uh oh!

kmurray commented Apr 2, 2019

Uh oh!

kmurray commented Apr 2, 2019

Uh oh!

kmurray commented May 21, 2019

Uh oh!

mithro commented May 21, 2019

Uh oh!

Wire length metric incorrectly terminates routing iterations #524

Wire length metric incorrectly terminates routing iterations #524

Comments

litghost commented Apr 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce

kmurray commented Apr 2, 2019

Uh oh!

litghost commented Apr 2, 2019

Uh oh!

kmurray commented Apr 2, 2019

Uh oh!

kmurray commented Apr 2, 2019

Uh oh!

kmurray commented May 21, 2019

Uh oh!

mithro commented May 21, 2019

Uh oh!

litghost commented Apr 2, 2019 •

edited

Loading