Skip to content

Wire length metric incorrectly terminates routing iterations #524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
litghost opened this issue Apr 2, 2019 · 6 comments
Closed

Wire length metric incorrectly terminates routing iterations #524

litghost opened this issue Apr 2, 2019 · 6 comments

Comments

@litghost
Copy link
Collaborator

litghost commented Apr 2, 2019

Expected Behaviour

Routing should complete

Current Behaviour

Confirming router algorithm: TIMING_DRIVEN.
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
   1    0.0     0.0    0    2907      14      27      28 ( 0.887%)   10178 (110.8%) 61817500000000.000 -6.182e+13 -61817500000000.000      0.000      0.000      N/A
Wire length usage ratio 1.10823 exceeds limit of 0.85, fail routing.
Routing failed.

Possible Solution

Disabling the wire length usage ratio results in a valid routing solution very quickly:

Confirming router algorithm: TIMING_DRIVEN.
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
   1    0.0     0.0    0    2907      14      27      28 ( 0.887%)   10178 (110.8%) 61817500000000.000 -6.182e+13 -61817500000000.000      0.000      0.000      N/A
Wire length usage ratio 1.10823 exceeds limit of 0.85, fail routing.
   2    0.1     0.5   10  286487      14      27      56 ( 1.774%)    4172 (45.4%) 49139500000000.000 -4.914e+13 -49139500000000.000      0.000      0.000      N/A
   3    0.0     0.6    3  124644       3       7       0 ( 0.000%)    2170 (23.6%) 32566500000000.000 -3.257e+13 -32566500000000.000      0.000      0.000      N/A

Steps to Reproduce

  1. Unzip vpr_bug2.zip
  2. Run:
vpr \
  arch.unique_pack.xml \
  top.eblif \
  --device 10x10 --read_rr_graph rr_graph_10x10_dummy.rr_graph.real.xml \
  --min_route_chan_width_hint 100 --max_criticality 0.0 --max_router_iterations 500 \
  --routing_failure_predictor off --router_high_fanout_threshold -1 --constant_net_method route \ 
  --route_chan_width 100 --pack --place --route
@kmurray
Copy link
Contributor

kmurray commented Apr 2, 2019

I was able to reproduce this with:

vpr arch.unique_pack.xml top.eblif --device 10x10 --read_rr_graph rr_graph_10x10_dummy.rr_graph.real.xml --constant_net_method route --route_chan_width 100 --timing_analysis off

on master.

The wirelength early abort threshold is there to avoid spending large amounts of CPU time when routing is far to congested to finish -- this is generally a good thing, and not something we want to remove in general.

What's happening in this case is that on the first iteration all signals end-up on the dedicated routing network (since it's only a single 'hop' between source and destination). But since the dedicated routing network consists of shorted-together RR nodes which span the entire chip, each net ends up using a huge amount of wiring which causes the wirelength abort threshold to be hit.

A short-term work-around is to make the wirelength abort threshold a command-line option so it can be adjusted if it causes issues on architectures like this.

A longer-term fix is to modify how the router internally costs non-configurably connected nodes. Currently a set of non-configurably connected nodes are costed as equivalent to the minimum-cost across all the nodes in the set. This is what you may want from a timing perspective (i.e. it's good to use the non-configurably connected set of nodes if one leads to a faster path). However for large sets of nodes (like in this case) it drastically under-estimates the resource cost (e.g. wirelength) of using them.

@litghost
Copy link
Collaborator Author

litghost commented Apr 2, 2019

What's happening in this case is that on the first iteration all signals end-up on the dedicated routing network (since it's only a single 'hop' between source and destination). But since the dedicated routing network consists of shorted-together RR nodes which span the entire chip, each net ends up using a huge amount of wiring which causes the wirelength abort threshold to be hit.

I kinda figured it would be something like this.

A short-term work-around is to make the wirelength abort threshold a command-line option so it can be adjusted if it causes issues on architectures like this.

Agreed.

A longer-term fix is to modify how the router internally costs non-configurably connected nodes. Currently a set of non-configurably connected nodes are costed as equivalent to the minimum-cost across all the nodes in the set. This is what you may want from a timing perspective (i.e. it's good to use the non-configurably connected set of nodes if one leads to a faster path). However for large sets of nodes (like in this case) it drastically under-estimates the resource cost (e.g. wirelength) of using them.

Wouldn't this penalize the global network? That might be a bad thing in the case of a global network. I think it would more accurately characterize the solution is to have the router avoid the global network if the source of net cannot reach the global network?

@kmurray
Copy link
Contributor

kmurray commented Apr 2, 2019

Wouldn't this penalize the global network? That might be a bad thing in the case of a global network. I think it would more accurately characterize the solution is to have the router avoid the global network if the source of net cannot reach the global network?

It's somewhat a matter of perspective. I agree that more carefully controlling access to the global network (e.g. #521) would probably help this specific case.

However in general the router should truly understand the cost of using a dedicated network. For example in this architecture, choosing to use it means it uses a lot of wire, and it may be better (e.g. for a low fan-out signal) to use the regular wiring.

I've filed #525 to log that issue.

@kmurray
Copy link
Contributor

kmurray commented Apr 2, 2019

93fc414 gives a command-line option to control the wirelength abort threshold as a work-around.

@kmurray
Copy link
Contributor

kmurray commented May 21, 2019

Should be fixed as of 616833e which fixes the costing of non-configurably connected routing.

Previously I would get:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
   1    0.0     0.0    0    4505      14      27      57 ( 1.806%)    9436 (102.7%)      N/A        N/A        N/A        N/A        N/A      N/A
   2    6.7     0.5   13  169364      13      26      56 ( 1.774%)    3360 (36.6%)      N/A        N/A        N/A        N/A        N/A      N/A
   3    0.0     0.6    4   38144       2       4       0 ( 0.000%)    2002 (21.8%)      N/A        N/A        N/A        N/A        N/A      N/A
Restoring best routing
Successfully routed after 3 routing iterations.

Where the first iteration use > 100% of available wire since most signals try to use the global network.

With the latest code I now get:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
   1    0.0     0.0    0  190827      14      27       8 ( 0.253%)     658 ( 7.2%)      N/A        N/A        N/A        N/A        N/A      N/A
   2    6.1     0.5    0  154740      12      25       0 ( 0.000%)     658 ( 7.2%)      N/A        N/A        N/A        N/A        N/A      N/A

Restoring best routing
Successfully routed after 2 routing iterations.

Where most signals avoid the global network since it is now costed in proportion to its wirelength.

@kmurray kmurray closed this as completed May 21, 2019
@mithro
Copy link
Contributor

mithro commented May 21, 2019

@acomodi + @litghost - Can you confirm this fixes things in our use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants