Skip to content

Update README.md to point to VTR 9 paper #3034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,64 @@ _The following are changes which have been implemented in the VTR master branch

### Removed


## v9.0.0 - 2024-12-23

### Added
* Support for Advanced Architectures:
* 3D FPGA and RAD architectures.
* Architectures with hard Networks-on-Chip (NoCs).
* Distinct horizontal and vertical channel widths and types.
* Diagonal routing wires and other complex wire shapes (L-shaped, T-shaped, ....).

* New Benchmark Suites:
* Koios: A deep-learning-focused benchmark suite with various design sizes.
* Hermes: Benchmarks utilizing hard NoCs.
* TitanNew: Large benchmarks targeting the Stratix 10 architecture.

* Commercial FPGAs Architecture Captures:
* Intel’s Stratix 10 FPGA architecture.
* AMD’s 7-series FPGA architecture.

* Parmys Logic Synthesis Flow:
* Better Verilog language coverage
* More efficient hard block mapping

* VPR Graphics Visualizations:
* New interface for improved usability and underlying graphics rewritten using EZGL/GTK to allow more UI widgets.
* Algorithm breakpoint visualizations for placement and routing algorithm debugging.
* User-guided (manual) placement optimization features.
* Enabled a live connection for client graphical application to VTR engines through sockets (server mode).
* Interactive timing path analysis (IPA) client using server mode.

* Performance Enhancements:
* Parallel router for faster inter-cluster routing or flat routing.

* Re-clustering API to modify packing decisions during the flow.
* Support for floorplanning and placement constraints.
* Unified intra- and inter-cluster (flat) routing.
* Comprehensive web-based VTR utilities and API documentation.

### Changed
* The default values of many command line options (e.g. inner_num is 0.5 instead of 1.0)
* Changes to placement engine
* Smart centroid initial placement algorithm.
* Multiple smart placement directed moves.
* Reinforcement learning-based placement algorithm.
* Changes to routing engine
* Faster lookahead creation.
* More accurate lookahead for large blocks.
* More efficient heap and pruning strategies.
* max `pres_fac` capped to avoid possible numeric issues.


### Fixed
* Many algorithmic and coding bugs are fixed in this release

### Removed
* Breadth-first (non-timing-driven) router.
* Non-linear congestion placement cost.

## v8.0.0 - 2020-03-24

### Added
Expand Down
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ option(ODIN_SANITIZE "Enable building odin with sanitize flags" OFF)
option(WITH_PARMYS "Enable Yosys as elaborator and parmys-plugin as partial mapper" ON)
option(YOSYS_F4PGA_PLUGINS "Enable building and installing Yosys SystemVerilog and UHDM plugins" OFF)

set(VTR_VERSION_MAJOR 8)
set(VTR_VERSION_MINOR 1)
set(VTR_VERSION_MAJOR 9)
set(VTR_VERSION_MINOR 0)
set(VTR_VERSION_PATCH 0)
set(VTR_VERSION_PRERELEASE "dev")

Expand Down
27 changes: 15 additions & 12 deletions README.developers.md
Original file line number Diff line number Diff line change
Expand Up @@ -637,6 +637,10 @@ They can be used for FPGA architecture exploration for DL and also for tuning CA

A typical approach to evaluating an algorithm change would be to run `koios_medium` (or `koios_medium_no_hb`) tasks from the nightly regression test (vtr_reg_nightly_test4), the `koios_large` (or `koios_large_no_hb`) and the `koios_proxy` (or `koios_proxy_no_hb`) tasks from the weekly regression test (vtr_reg_weekly). The nightly test contains smaller benchmarks, whereas the large designs are in the weekly regression test. To measure QoR for the entire benchmark suite, both nightly and weekly tests should be run and the results should be concatenated.

As 3 of the `koios_large` circuits require special settings due to having long DSP chains, they are split in separate tasks as follows:
* `bwave_like.float.large.v` and `bwave_like.fixed.large.v` are in `vtr_reg_weekly/koios_bwave_large` task
* `dla_like.large.v` is in `vtr_reg_weekly/koios_dla_large` task

For evaluating an algorithm change in the Odin frontend, run `koios_medium` (or `koios_medium_no_hb`) tasks from the nightly regression test (vtr_reg_nightly_test4_odin) and the `koios_large_odin` (or `koios_large_no_hb_odin`) tasks from the weekly regression test (vtr_reg_weekly).

The `koios_medium`, `koios_large`, and `koios_proxy` regression tasks run these benchmarks with complex_dsp functionality enabled, whereas `koios_medium_no_hb`, `koios_large_no_hb` and `koios_proxy_no_hb` regression tasks run these benchmarks without complex_dsp functionality. Normally, only the `koios_medium`, `koios_large`, and `koios_proxy` tasks should be enough for QoR.
Expand All @@ -651,6 +655,8 @@ The following table provides details on available Koios settings in VTR flow:
| Nightly | Medium designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_nightly_test4_odin/koios_medium | Odin | |
| Nightly | Medium designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | | vtr_reg_nightly_test4_odin/koios_medium_no_hb | Odin | |
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_weekly/koios_large | Parmys | |
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_weekly/koios_dla_large | Parmys | |
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_weekly/koios_bwave_large | Parmys | |
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | | vtr_reg_weekly/koios_large_no_hb | Parmys | |
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_weekly/koios_large_odin | Odin | |
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | | vtr_reg_weekly/koios_large_no_hb_odin | Odin | |
Expand All @@ -661,7 +667,15 @@ The following table provides details on available Koios settings in VTR flow:

For more information refer to the [Koios benchmark home page](vtr_flow/benchmarks/verilog/koios/README.md).

The following steps show a sequence of commands to run the `koios` tasks on the Koios benchmarks:
To make running all the koios benchmarks easier, especially with thos circuits scattered between different tasks, there is an overall task list that runs all the 40 circuits of Koios as follows (this will run all the circuits with complex DSP functionality enabled. If you want to disable the complex DSP, edit the file to point to the `koios_*_no_hb` tasks):

```shell
$ ../scripts/run_vtr_task.py -l koios_task_list.txt

#Several hours later... they complete
#

If you want to run a subset of the koios benchmarks or run them without hard DSP blocks, you can run lower-level 'koios' tasks as follows:

```shell
#From the VTR root
Expand All @@ -681,17 +695,6 @@ $ ../scripts/run_vtr_task.py regression_tests/vtr_reg_weekly/koios_sv_no_hb &

#Several hours later... they complete

#Parse the results
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios_medium
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_large
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_proxy
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_sv

$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios_medium_no_hb
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_large_no_hb
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_proxy_no_hb
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_sv_no_hb

#The run directory should now contain a summary parse_results.txt file
$ head -5 vtr_reg_nightly_test4/koios_medium/<latest_run_dir>/parse_results.txt
arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time placed_wirelength_est place_mem place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@ See the [full license](LICENSE.md) for details.
## How to Cite
The following paper may be used as a general citation for VTR:

K. E. Murray, O. Petelin, S. Zhong, J. M. Wang, M. ElDafrawy, J.-P. Legault, E. Sha, A. G. Graham, J. Wu, M. J. P. Walker, H. Zeng, P. Patros, J. Luu, K. B. Kent and V. Betz "VTR 8: High Performance CAD and Customizable FPGA Architecture Modelling", ACM TRETS, 2020.
M. A. Elgammal, A. Mohaghegh, S. G. Shahrouz, F. Mahmoudi, F. Kosar, K. Talaei, J. Fife, D. Khadivi, K. Murray, A. Boutros, K. B. Kent, J. Geoders, and V. Betz "VTR 9: Open-Source CAD for Fabric and Beyond FPGA Architecture Exploration", ACM TRETS, 2025.

Bibtex:
```
@article{vtr8,
title={VTR 8: High Performance CAD and Customizable FPGA Architecture Modelling},
author={Murray, Kevin E. and Petelin, Oleg and Zhong, Sheng and Wang, Jai Min and ElDafrawy, Mohamed and Legault, Jean-Philippe and Sha, Eugene and Graham, Aaron G. and Wu, Jean and Walker, Matthew J. P. and Zeng, Hanqing and Patros, Panagiotis and Luu, Jason and Kent, Kenneth B. and Betz, Vaughn},
@article{vtr9,
title={VTR 9: Open-Source CAD for Fabric and Beyond FPGA Architecture Exploration},
author={Elgammal, Mohamed A. and Mohaghegh, Amin and Shahrouz, Soheil G. and Mahmoudi, Fatemehsadat and Kosar, Fahrican and Talaei, Kimia and Fife, Joshua and Khadivi, Daniel and Murray, Kevin and Boutros, Andrew and Kent, Kenneth B. and Goeders, Jeff and Betz, Vaughn},
journal={ACM Trans. Reconfigurable Technol. Syst.},
year={2020}
year={2025}
}
```

Expand Down
12 changes: 8 additions & 4 deletions doc/src/vpr/command_line_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1284,13 +1284,17 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout

This option attempts to verify the minimum by routing at successively lower channel widths until two consecutive routing failures are observed.

.. option:: --router_algorithm {parallel | timing_driven}
.. option:: --router_algorithm {timing_driven | parallel | parallel_decomp}

Selects which router algorithm to use.
Selects which router algorithm to use.

.. warning::
* ``timing_driven`` is the default single-threaded PathFinder algorithm.

* ``parallel`` partitions the device to route non-overlapping nets in parallel. Use with the ``-j`` option to specify the number of threads.

* ``parallel_decomp`` decomposes nets for aggressive parallelization :cite:`kosar2024parallel`. This imposes additional constraints and may result in worse QoR for difficult circuits.

The ``parallel`` router is experimental. (TODO: more explanation)
Note that both ``parallel`` and ``parallel_decomp`` are timing-driven routers.

**Default:** ``timing_driven``

Expand Down
6 changes: 6 additions & 0 deletions doc/src/z_references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -430,3 +430,9 @@ @inproceedings{koios_benchmarks
year={2021}
}

@inproceedings{kosar2024parallel,
title={Parallel FPGA Routing with On-the-Fly Net Decomposition},
author={Kosar, Fahrican and Stojilovic, Mirjana and Betz, Vaughn},
booktitle={The 23rd International Conference on Field-Programmable Technology},
year={2024}
}
20 changes: 14 additions & 6 deletions vpr/src/base/vpr_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -1641,7 +1641,7 @@ typedef t_routing_status<AtomNetId> t_atom_net_routing_status;

/** Edge between two RRNodes */
struct t_node_edge {
t_node_edge(RRNodeId fnode, RRNodeId tnode)
t_node_edge(RRNodeId fnode, RRNodeId tnode) noexcept
: from_node(fnode)
, to_node(tnode) {}

Expand All @@ -1654,10 +1654,18 @@ struct t_node_edge {
}
};

///@brief Non-configurably connected nodes and edges in the RR graph
/**
* @brief Groups of non-configurably connected nodes and edges in the RR graph.
* @note Each group is represented by a node set and an edge set, stored at the same index.
*
* For example, in an architecture with L-shaped wires formed by an x- and y-directed segment
* connected by an electrical short, each L-shaped wire corresponds to a new group. The group's
* index provides access to its node set (containing two RRNodeIds) and edge set (containing two
* directed edge in opposite directions).
*/
struct t_non_configurable_rr_sets {
std::set<std::set<RRNodeId>> node_sets;
std::set<std::set<t_node_edge>> edge_sets;
std::vector<std::set<RRNodeId>> node_sets;
std::vector<std::set<t_node_edge>> edge_sets;
};

///@brief Power estimation options
Expand All @@ -1669,11 +1677,11 @@ struct t_power_opts {
* @param max= Maximum channel width between x_max and y_max.
* @param x_min= Minimum channel width of horizontal channels. Initialized when init_chan() is invoked in rr_graph2.cpp
* @param y_min= Same as above but for vertical channels.
* @param x_max= Maximum channel width of horiozntal channels. Initialized when init_chan() is invoked in rr_graph2.cpp
* @param x_max= Maximum channel width of horizontal channels. Initialized when init_chan() is invoked in rr_graph2.cpp
* @param y_max= Same as above but for vertical channels.
* @param x_list= Stores the channel width of all horizontal channels and thus goes from [0..grid.height()]
* (imagine a 2D Cartesian grid with horizontal lines starting at every grid point on a line parallel to the y-axis)
* @param y_list= Stores the channel width of all verical channels and thus goes from [0..grid.width()]
* @param y_list= Stores the channel width of all vertical channels and thus goes from [0..grid.width()]
* (imagine a 2D Cartesian grid with vertical lines starting at every grid point on a line parallel to the x-axis)
*/

Expand Down
Loading