Skip to content

Commit 6da8915

Browse files
authored
Merge pull request #2920 from ueqri/upstream-fpt24-fine-grained-parallel-router
[Router] Upstream Fine-Grained Parallel Router (FPT'24)
2 parents 33c4c01 + f7e3ada commit 6da8915

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+3446
-1628
lines changed
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
==========
2+
Connection Router
3+
==========
4+
5+
ConnectionRouter
6+
---------
7+
.. doxygenfile:: connection_router.h
8+
:project: vpr
9+
10+
SerialConnectionRouter
11+
----------
12+
.. doxygenclass:: SerialConnectionRouter
13+
:project: vpr
14+
15+
ParallelConnectionRouter
16+
----------
17+
.. doxygenclass:: ParallelConnectionRouter
18+
:project: vpr

doc/src/api/vprinternals/vpr_router.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ VPR Router
99

1010
router_heap
1111
router_lookahead
12+
router_connection_router

doc/src/vpr/command_line_usage.rst

Lines changed: 107 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,12 @@ By default VPR will perform a binary search routing to find the minimum channel
4747

4848
Detailed Command-line Options
4949
-----------------------------
50-
VPR has a lot of options. Running :option:`vpr --help` will display all the available options and their usage information.
50+
VPR has a lot of options. Running :option:`vpr --help` will display all the available options and their usage information.
5151

5252
.. option:: -h, --help
5353

5454
Display help message then exit.
55-
55+
5656
The options most people will be interested in are:
5757

5858
* :option:`--route_chan_width` (route at a fixed channel width), and
@@ -208,7 +208,7 @@ General Options
208208
* Any string matching ``name`` attribute of a device layout defined with a ``<fixed_layout>`` tag in the :ref:`arch_grid_layout` section of the architecture file.
209209

210210
If the value specified is neither ``auto`` nor matches the ``name`` attribute value of a ``<fixed_layout>`` tag, VPR issues an error.
211-
211+
212212
.. note:: If the only layout in the architecture file is a single device specified using ``<fixed_layout>``, it is recommended to always specify the ``--device`` option; this prevents the value ``--device auto`` from interfering with operations supported only for ``<fixed_layout>`` grids.
213213

214214
**Default:** ``auto``
@@ -900,7 +900,7 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
900900

901901
.. option:: --place_agent_algorithm {e_greedy | softmax}
902902

903-
Controls which placement RL agent is used.
903+
Controls which placement RL agent is used.
904904

905905
**Default:** ``softmax``
906906

@@ -922,10 +922,10 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
922922

923923
.. option:: --place_reward_fun {basic | nonPenalizing_basic | runtime_aware | WLbiased_runtime_aware}
924924

925-
The reward function used by the placement RL agent to learn the best action at each anneal stage.
925+
The reward function used by the placement RL agent to learn the best action at each anneal stage.
926+
927+
.. note:: The latter two are only available for timing-driven placement.
926928

927-
.. note:: The latter two are only available for timing-driven placement.
928-
929929
**Default:** ``WLbiased_runtime_aware``
930930

931931
.. option:: --place_agent_space {move_type | move_block_type}
@@ -935,20 +935,20 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
935935
**Default:** ``move_block_type``
936936

937937
.. option:: --place_quench_only {on | off}
938-
938+
939939
If this option is set to ``on``, the placement will skip the annealing phase and only perform the placement quench.
940-
This option is useful when the the quality of initial placement is good enough and there is no need to perform the
940+
This option is useful when the the quality of initial placement is good enough and there is no need to perform the
941941
annealing phase.
942942

943943
**Default:** ``off``
944944

945945

946946
.. option:: --placer_debug_block <int>
947-
947+
948948
.. note:: This option is likely only of interest to developers debugging the placement algorithm
949949

950-
Controls which block the placer produces detailed debug information for.
951-
950+
Controls which block the placer produces detailed debug information for.
951+
952952
If the block being moved has the same ID as the number assigned to this parameter, the placer will print debugging information about it.
953953

954954
* For values >= 0, the value is the block ID for which detailed placer debug information should be produced.
@@ -960,7 +960,7 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
960960
**Default:** ``-2``
961961

962962
.. option:: --placer_debug_net <int>
963-
963+
964964
.. note:: This option is likely only of interest to developers debugging the placement algorithm
965965

966966
Controls which net the placer produces detailed debug information for.
@@ -1004,7 +1004,7 @@ The following options are only valid when the placement engine is in timing-driv
10041004

10051005
.. option:: --quench_recompute_divider <int>
10061006

1007-
Controls how many times the placer performs a timing analysis to update its criticality estimates during a quench.
1007+
Controls how many times the placer performs a timing analysis to update its criticality estimates during a quench.
10081008
If unspecified, uses the value from --inner_loop_recompute_divider.
10091009

10101010
**Default:** ``0``
@@ -1088,7 +1088,7 @@ The following options are only valid when the placement engine is in timing-driv
10881088

10891089
NoC Options
10901090
^^^^^^^^^^^^^^
1091-
The following options are only used when FPGA device and netlist contain a NoC router.
1091+
The following options are only used when FPGA device and netlist contain a NoC router.
10921092

10931093
.. option:: --noc {on | off}
10941094

@@ -1098,15 +1098,15 @@ The following options are only used when FPGA device and netlist contain a NoC r
10981098
**Default:** ``off``
10991099

11001100
.. option:: --noc_flows_file <file>
1101-
1101+
11021102
XML file containing the list of traffic flows within the NoC (communication between routers).
11031103

11041104
.. note:: noc_flows_file are required to specify if NoC optimization is turned on (--noc on).
11051105

11061106
.. option:: --noc_routing_algorithm {xy_routing | bfs_routing | west_first_routing | north_last_routing | negative_first_routing | odd_even_routing}
11071107

11081108
Controls the algorithm used by the NoC to route packets.
1109-
1109+
11101110
* ``xy_routing`` Uses the direction oriented routing algorithm. This is recommended to be used with mesh NoC topologies.
11111111
* ``bfs_routing`` Uses the breadth first search algorithm. The objective is to find a route that uses a minimum number of links. This algorithm is not guaranteed to generate deadlock-free traffic flow routes, but can be used with any NoC topology.
11121112
* ``west_first_routing`` Uses the west-first routing algorithm. This is recommended to be used with mesh NoC topologies.
@@ -1119,11 +1119,11 @@ The following options are only used when FPGA device and netlist contain a NoC r
11191119
.. option:: --noc_placement_weighting <float>
11201120

11211121
Controls the importance of the NoC placement parameters relative to timing and wirelength of the design.
1122-
1122+
11231123
* ``noc_placement_weighting = 0`` means the placement is based solely on timing and wirelength.
11241124
* ``noc_placement_weighting = 1`` means noc placement is considered equal to timing and wirelength.
11251125
* ``noc_placement_weighting > 1`` means the placement is increasingly dominated by NoC parameters.
1126-
1126+
11271127
**Default:** ``5.0``
11281128

11291129
.. option:: --noc_aggregate_bandwidth_weighting <float>
@@ -1141,7 +1141,7 @@ The following options are only used when FPGA device and netlist contain a NoC r
11411141
Other positive numbers specify the importance of meeting latency constraints compared to other NoC-related cost terms.
11421142
Weighting factors for NoC-related cost terms are normalized internally. Therefore, their absolute values are not important, and
11431143
only their relative ratios determine the importance of each cost term.
1144-
1144+
11451145
**Default:** ``0.6``
11461146

11471147
.. option:: --noc_latency_weighting <float>
@@ -1151,7 +1151,7 @@ The following options are only used when FPGA device and netlist contain a NoC r
11511151
Other positive numbers specify the importance of minimizing aggregate latency compared to other NoC-related cost terms.
11521152
Weighting factors for NoC-related cost terms are normalized internally. Therefore, their absolute values are not important, and
11531153
only their relative ratios determine the importance of each cost term.
1154-
1154+
11551155
**Default:** ``0.02``
11561156

11571157
.. option:: --noc_congestion_weighting <float>
@@ -1167,11 +1167,11 @@ The following options are only used when FPGA device and netlist contain a NoC r
11671167
.. option:: --noc_swap_percentage <float>
11681168

11691169
Sets the minimum fraction of swaps attempted by the placer that are NoC blocks.
1170-
This value is an integer ranging from [0-100].
1171-
1172-
* ``0`` means NoC blocks will be moved at the same rate as other blocks.
1170+
This value is an integer ranging from [0-100].
1171+
1172+
* ``0`` means NoC blocks will be moved at the same rate as other blocks.
11731173
* ``100`` means all swaps attempted by the placer are NoC router blocks.
1174-
1174+
11751175
**Default:** ``0``
11761176

11771177
.. option:: --noc_placement_file_name <file>
@@ -1257,7 +1257,7 @@ Analytical Placement is generally split into three stages:
12571257

12581258
* ``none`` Do not use any Detailed Placer.
12591259

1260-
* ``annealer`` Use the Annealer from the Placement stage as a Detailed Placer. This will use the same Placer Options from the Place stage to configure the annealer.
1260+
* ``annealer`` Use the Annealer from the Placement stage as a Detailed Placer. This will use the same Placer Options from the Place stage to configure the annealer.
12611261

12621262
**Default:** ``annealer``
12631263

@@ -1386,8 +1386,8 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout
13861386

13871387
.. option:: --max_pres_fac <float>
13881388

1389-
Sets the maximum present overuse penalty factor that can ever result during routing. Should always be less than 1e25 or so to prevent overflow.
1390-
Smaller values may help prevent circuitous routing in difficult routing problems, but may increase
1389+
Sets the maximum present overuse penalty factor that can ever result during routing. Should always be less than 1e25 or so to prevent overflow.
1390+
Smaller values may help prevent circuitous routing in difficult routing problems, but may increase
13911391
the number of routing iterations needed and hence runtime.
13921392

13931393
**Default:** ``1000.0``
@@ -1466,7 +1466,7 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout
14661466

14671467
.. option:: --router_algorithm {timing_driven | parallel | parallel_decomp}
14681468

1469-
Selects which router algorithm to use.
1469+
Selects which router algorithm to use.
14701470

14711471
* ``timing_driven`` is the default single-threaded PathFinder algorithm.
14721472

@@ -1548,13 +1548,90 @@ The following options are only valid when the router is in timing-driven mode (t
15481548
**Default:** ``0.0``
15491549

15501550
.. option:: --router_profiler_astar_fac <float>
1551-
1551+
15521552
Controls the directedness of the timing-driven router's exploration when doing router delay profiling of an architecture.
15531553
The router delay profiling step is currently used to calculate the place delay matrix lookup.
15541554
Values between 1 and 2 are resonable; higher values trade some quality for reduced run-time.
15551555

15561556
**Default:** ``1.2``
15571557

1558+
.. option:: --enable_parallel_connection_router {on | off}
1559+
1560+
Controls whether the MultiQueue-based parallel connection router is used during a single connection routing.
1561+
1562+
When enabled, the parallel connection router accelerates the path search for individual source-sink connections using
1563+
multi-threading without altering the net routing order.
1564+
1565+
**Default:** ``off``
1566+
1567+
.. option:: --post_target_prune_fac <float>
1568+
1569+
Controls the post-target pruning heuristic calculation in the parallel connection router.
1570+
1571+
This parameter is used as a multiplicative factor applied to the VPR heuristic (not guaranteed to be admissible, i.e.,
1572+
might over-predict the cost to the sink) to calculate the 'stopping heuristic' when pruning nodes after the target has
1573+
been reached. The 'stopping heuristic' must be admissible for the path search algorithm to guarantee optimal paths and
1574+
be deterministic.
1575+
1576+
Values of this parameter are architecture-specific and have to be empirically found.
1577+
1578+
This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
1579+
1580+
**Default:** ``1.2``
1581+
1582+
.. option:: --post_target_prune_offset <float>
1583+
1584+
Controls the post-target pruning heuristic calculation in the parallel connection router.
1585+
1586+
This parameter is used as a subtractive offset together with :option:`--post_target_prune_fac` to apply an affine
1587+
transformation on the VPR heuristic to calculate the 'stopping heuristic'. The 'stopping heuristic' must be admissible
1588+
for the path search algorithm to guarantee optimal paths and be deterministic.
1589+
1590+
Values of this parameter are architecture-specific and have to be empirically found.
1591+
1592+
This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
1593+
1594+
**Default:** ``0.0``
1595+
1596+
.. option:: --multi_queue_num_threads <int>
1597+
1598+
Controls the number of threads used by MultiQueue-based parallel connection router.
1599+
1600+
If not explicitly specified, defaults to 1, implying the parallel connection router works in 'serial' mode using only
1601+
one main thread to route.
1602+
1603+
This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
1604+
1605+
**Default:** ``1``
1606+
1607+
.. option:: --multi_queue_num_queues <int>
1608+
1609+
Controls the number of queues used by MultiQueue in the parallel connection router.
1610+
1611+
Must be set >= 2. A common configuration for this parameter is the number of threads used by MultiQueue * 4 (the number
1612+
of queues per thread).
1613+
1614+
This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
1615+
1616+
**Default:** ``2``
1617+
1618+
.. option:: --multi_queue_direct_draining {on | off}
1619+
1620+
Controls whether to enable queue draining optimization for MultiQueue-based parallel connection router.
1621+
1622+
When enabled, queues can be emptied quickly by draining all elements if no further solutions need to be explored after
1623+
the target is reached in the path search.
1624+
1625+
Note: For this optimization to maintain optimality and deterministic results, the 'ordering heuristic' (calculated by
1626+
:option:`--astar_fac` and :option:`--astar_offset`) must be admissible to ensure emptying queues of entries with higher
1627+
costs does not prune possibly superior solutions. However, you can still enable this optimization regardless of whether
1628+
optimality and determinism are required for your specific use case (in such cases, the 'ordering heuristic' can be
1629+
inadmissible).
1630+
1631+
This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
1632+
1633+
**Default:** ``off``
1634+
15581635
.. option:: --max_criticality <float>
15591636

15601637
Sets the maximum fraction of routing cost that can come from delay (vs. coming from routability) for any net.

utils/route_diag/src/main.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ static void do_one_route(const Netlist<>& net_list,
9797
segment_inf,
9898
is_flat);
9999

100-
ConnectionRouter<FourAryHeap> router(
100+
SerialConnectionRouter<FourAryHeap> router(
101101
device_ctx.grid,
102102
*router_lookahead,
103103
device_ctx.rr_graph.rr_nodes(),

vpr/src/base/SetupVPR.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,12 @@ static void SetupRouterOpts(const t_options& Options, t_router_opts* RouterOpts)
431431
RouterOpts->astar_fac = Options.astar_fac;
432432
RouterOpts->astar_offset = Options.astar_offset;
433433
RouterOpts->router_profiler_astar_fac = Options.router_profiler_astar_fac;
434+
RouterOpts->enable_parallel_connection_router = Options.enable_parallel_connection_router;
435+
RouterOpts->post_target_prune_fac = Options.post_target_prune_fac;
436+
RouterOpts->post_target_prune_offset = Options.post_target_prune_offset;
437+
RouterOpts->multi_queue_num_threads = Options.multi_queue_num_threads;
438+
RouterOpts->multi_queue_num_queues = Options.multi_queue_num_queues;
439+
RouterOpts->multi_queue_direct_draining = Options.multi_queue_direct_draining;
434440
RouterOpts->bb_factor = Options.bb_factor;
435441
RouterOpts->criticality_exp = Options.criticality_exp;
436442
RouterOpts->max_criticality = Options.max_criticality;

vpr/src/base/ShowSetup.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,12 @@ static void ShowRouterOpts(const t_router_opts& RouterOpts) {
379379
VTR_LOG("RouterOpts.astar_fac: %f\n", RouterOpts.astar_fac);
380380
VTR_LOG("RouterOpts.astar_offset: %f\n", RouterOpts.astar_offset);
381381
VTR_LOG("RouterOpts.router_profiler_astar_fac: %f\n", RouterOpts.router_profiler_astar_fac);
382+
VTR_LOG("RouterOpts.enable_parallel_connection_router: %s\n", RouterOpts.enable_parallel_connection_router ? "true" : "false");
383+
VTR_LOG("RouterOpts.post_target_prune_fac: %f\n", RouterOpts.post_target_prune_fac);
384+
VTR_LOG("RouterOpts.post_target_prune_offset: %f\n", RouterOpts.post_target_prune_offset);
385+
VTR_LOG("RouterOpts.multi_queue_num_threads: %d\n", RouterOpts.multi_queue_num_threads);
386+
VTR_LOG("RouterOpts.multi_queue_num_queues: %d\n", RouterOpts.multi_queue_num_queues);
387+
VTR_LOG("RouterOpts.multi_queue_direct_draining: %s\n", RouterOpts.multi_queue_direct_draining ? "true" : "false");
382388
VTR_LOG("RouterOpts.criticality_exp: %f\n", RouterOpts.criticality_exp);
383389
VTR_LOG("RouterOpts.max_criticality: %f\n", RouterOpts.max_criticality);
384390
VTR_LOG("RouterOpts.init_wirelength_abort_threshold: %f\n", RouterOpts.init_wirelength_abort_threshold);

0 commit comments

Comments
 (0)