Skip to content

Commit f473d09

Browse files
authored
Merge pull request #1787 from aman26kbm/koios_benchmarks2
Koios benchmarks (2nd PR)
2 parents e3bb6bf + a75b784 commit f473d09

File tree

27 files changed

+299
-20
lines changed

27 files changed

+299
-20
lines changed

ODIN_II/regression_test/benchmark/task/koios/task.conf

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,29 @@
11
########################
2-
# large benchmarks config
2+
# Koios benchmarks config
33
########################
44

55
regression_params=--disable_simulation --disable_parallel_jobs --verbose
66
script_synthesis_params=--limit_ressource --time_limit 14400s
77
script_simulation_params=--limit_ressource --time_limit 14400s
88

9-
# setup the architecture
9+
#-------------------------------------------------------
10+
# specify the directory to look for architecture file in
11+
#-------------------------------------------------------
1012
archs_dir=../vtr_flow/arch/COFFE_22nm
1113

12-
# one arch allows it to run faster given it is single threaded
14+
#-------------------------------------------------------
15+
# specify the architecture file
16+
#-------------------------------------------------------
1317
arch_list_add=k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml
1418

19+
#-------------------------------------------------------
20+
# specify the directory to look for benchmarks in
21+
#-------------------------------------------------------
1522
circuits_dir=../../../../vtr_flow/benchmarks/verilog/koios
1623

17-
# glob the large benchmark and the vtr ones to prevent duplicate run
24+
#-------------------------------------------------------
25+
# specify the benchmarks
26+
#-------------------------------------------------------
1827
circuit_list_add=tpu_like.small.v
1928
circuit_list_add=dla_like.small.v
2029
circuit_list_add=bnn.v
@@ -28,4 +37,18 @@ circuit_list_add=reduction_layer.v
2837
circuit_list_add=spmv.v
2938
circuit_list_add=softmax.v
3039

40+
#-------------------------------------------------------
41+
# specify the directory to look for include file in
42+
#-------------------------------------------------------
43+
includes_dir=../../../../vtr_flow/benchmarks/verilog/koios
44+
45+
#-------------------------------------------------------
46+
# specify the include files
47+
#-------------------------------------------------------
48+
# Some benchmarks instantiate complex dsp blocks to implement features
49+
# like native floating point math, cascade chains, etc. This functionality
50+
# is guarded under the `complex_dsp` macro. The complex_dsp_include.v file
51+
# defines this macro, thereby enabling instantiations of the complex dsp.
52+
include_list_add=complex_dsp_include.v
53+
3154
synthesis_parse_file=regression_test/parse_result/conf/synth.toml

README.developers.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -626,6 +626,48 @@ stratixiv_arch.timing.xml stereo_vision_stratixiv_arch_timing.blif 0208312
626626
stratixiv_arch.timing.xml cholesky_mc_stratixiv_arch_timing.blif 0208312 success 140214 108592 67410 5444 121 90 -1 111 151 -1 -1 5221059 8.16972 -454610 -8.16972 1518597 15 0 0 2.38657e+08 21915.3 9.34704 -531231 -9.34704 0 0 211.12 364.32 490.24 6356252 -1 -1
627627
```
628628

629+
### Example: Koios Benchmarks QoR Measurement
630+
631+
The [Koios benchmarks](https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/vtr_flow/benchmarks/verilog/koios) are a group of Deep Learning benchmark circuits distributed with the VTR project.
632+
The are provided as synthesizable verilog and can be re-mapped to VTR supported architectures. They consist mostly of medium to large sized circuits from Deep Learning (DL).
633+
They can be used for FPGA architecture exploration for DL and also for tuning CAD tools.
634+
635+
A typical approach to evaluating an algorithm change would be to run `koios` (or `koios_no_complex_dsp`) task from the nightly regression test (vtr_reg_nightly_test4) and the `koios` (or `koios_no_complex_dsp`) task from the weekly regression test (vtr_reg_weekly). The nightly test contains smaller benchmarks, whereas the large designs are in the weekly regression test. To measure QoR for the entire benchmark suite, both nightly and weekly tests should be run and the results should be concatenated.
636+
637+
The `koios` regression task runs these benchmarks with complex_dsp functionality enabled, whereas `koios_no_complex_dsp` regression task runs these benchmarks without complex_dsp functionality. Normally, only the `koios` tasks should be enough for QoR.
638+
639+
The following steps show a sequence of commands to run the `koios` tasks on the Koios benchmarks from both nightly and weekly regressions:
640+
641+
```shell
642+
#From the VTR root
643+
$ cd vtr_flow/tasks
644+
645+
#Run the VTR benchmarks
646+
$ ../scripts/run_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios &
647+
$ ../scripts/run_vtr_task.py regression_tests/vtr_reg_weekly/koios &
648+
649+
#Several hours later... they complete
650+
651+
#Parse the results
652+
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios
653+
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios
654+
655+
#The run directory should now contain a summary parse_results.txt file
656+
$ head -5 vtr_reg_nightly_test4/koios/<latest_run_dir>/parse_results.txt
657+
arch circuit script_params vtr_flow_elapsed_time error odin_synth_time max_odin_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_time placed_wirelength_est place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time
658+
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml tpu_like.small.v common 2871.10 9.36 235096 5 619.21 -1 -1 159760 -1 -1 1119 355 14 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 2568860 355 289 50215 41827 2 23224 2053 136 136 18496 dsp_top auto 1233.72 457725 91.70 0.38 7.24742 -105267 -7.24742 2.59789 14.13 0.101267 0.0738583 24.91 18.6865 -1 561916 17 5.92627e+08 1.03195e+08 4.09037e+08 22114.9 16.37 32.3744 25.1979 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
659+
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml dla_like.small.v common 7527.41 42.24 729876 5 3941.31 -1 -1 630244 -1 -1 5545 194 828 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 4409476 194 13 217044 174718 1 91037 6708 164 164 26896 memory auto 1604.22 969627 663.41 2.84 5.61569 -424718 -5.61569 5.61569 21.49 0.584073 0.385993 104.796 73.1698 -1 1450542 14 8.6211e+08 3.01197e+08 5.93540e+08 22068.0 53.97 132.203 96.049 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
660+
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml bnn.v common 2028.52 40.37 577472 3 240.94 -1 -1 513656 -1 -1 5695 260 0 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 2195980 260 122 231647 179602 1 86181 6140 83 83 6889 clb auto 613.32 940951 503.35 2.87 6.4402 -131403 -6.4402 6.4402 5.41 0.753268 0.564332 85.331 60.8639 -1 1224690 16 2.13666e+08 1.74902e+08 1.51359e+08 21971.1 50.49 114.382 84.8538 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
661+
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml attention_layer.v common 1330.99 11.83 1095592 7 59.16 -1 -1 560612 -1 -1 1248 1058 161 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 1180420 1058 16 47407 39134 1 26605 2588 86 86 7396 dsp_top auto 728.70 234151 118.11 0.71 5.89837 -78343.6 -5.89837 5.89837 6.64 0.181478 0.146942 31.9659 24.5807 -1 366899 17 2.32446e+08 8.36361e+07 1.62201e+08 21930.9 16.25 40.6352 32.1556 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
662+
663+
$ head -5 vtr_reg_weekly/koios/<latest_run_dir>/parse_results.txt
664+
arch circuit script_params vtr_flow_elapsed_time error odin_synth_time max_odin_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_time placed_wirelength_est place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time
665+
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.small.v common 19316.21 162.86 2395612 3 15651.23 -1 -1 1337084 -1 -1 9309 293 407 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 4793656 293 290 374045 333664 1 99618 10661 152 152 23104 dsp_top auto 780.11 1482046 931.17 6.18 6.93257-474705 -6.93257 6.93257 27.96 1.30491 1.10886 193.621 147.02 -1 1762769 16 7.41832e+08 4.07655e+08 5.09972e+08 22072.9 106.48 263.016 205.56 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
666+
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.medium.v common 66156.51 483.84 4490632 3 60586.75 -1 -1 2480684 -1 -1 17641 293 784 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 8819072 293 578 696105 624609 1 181497 19958 206 206 42436 dsp_top auto 1057.61 3181171 1581.42 7.89 8.18417-1.24303e+06 -8.18417 8.18417 38.05 1.40496 0.993779 233.672 173.509 -1 3532357 18 1.36407e+09 7.68183e+08 9.34572e+08 22023.1 165.78 317.679 246.098 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
667+
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.large.v common 101695.32 717.44 6547568 3 94597.42 -1 -1 3590044 -1 -1 25995 293 1161 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 13040176 293 866 1018158 915547 1 263328 29277 248 248 61504 dsp_top auto 1503.62 5185716 1951.96 10.18 8.86387-1.97516e+06 -8.86387 8.86387 46.08 1.71937 1.21506 288.67 213.718 -1 5523775 18 1.98856e+09 1.12933e+09 1.35238e+09 21988.4 224.14 391.303 303.49 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
668+
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml lstm.v common 38901.96 35.76 651868 7 30532.88 -1 -1 606240 -1 -1 6626 17 305 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 6036204 17 19 252939 204226 1 121211 7577 200 200 40000 dsp_top auto 4576.24 1453809 1136.46 3.95 8.38544 -386636 -8.385448.38544 54.87 0.944433 0.763732 237.758 176.282 -1 1876011 15 1.28987e+09 3.81683e+08 8.80433e+08 22010.877.53 284.979 214.902 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
669+
```
670+
629671
## Comparing QoR Measurements
630672
Once you have two (or more) sets of QoR measurements they now need to be compared.
631673

doc/src/vtr/benchmarks.rst

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -97,17 +97,12 @@ These designs use many precisions including binary, different fixed point types
9797
eltwise_layer Matrix elementwise add/sub/mult
9898
================= ======================================
9999

100-
Koios benchmarks are fully compatible with the full VTR flow. Some Koios benchmarks use advanced DSP features that are available in only a few FPGA architectures provided with VTR. This is because they instantiate DSP macros to implement native FP16 multiplications or use the hard dedicated chains, and these are architecture-specific. If users want to use a different FPGA architecture file, they can replace the macro instantiations in the benchmarks with their equivalents from the FPGA architectures they wish to use.
101-
102-
Alternatively, users can disable these advanced features. The macro ``complex_dsp`` can be used for this purpose. If complex_dsp is defined in a benchmark file (using ```define complex_dsp`` in the beginning of the benchmark file), then advanced DSP features mentioned above will be used. If a user wants to run a Koios benchmark with FPGA architectures that don't have these advanced DSP features (for example, the flagship architectures: ``$VTR_ROOT/vtr_flow/arch/timing/k6_frac_N10_*_mem32K_40nm*``), then they can remove the line defining the complex_dsp macro. This enables the same functionality with behavioral Verilog that is mapped to the FPGA soft logic when an architecture without the required macro definitions is used.
103-
104100
The VTR benchmarks are provided as Verilog (enabling full flexibility to modify and change how the designs are implemented) under: ::
105101

106102
$VTR_ROOT/vtr_flow/benchmarks/verilog/koios
107103

108-
The FPGA architectures with advanced DSP that work out-of-the-box with Koios benchmarks are available here: ::
104+
To use these benchmarks, please see the documentation in the README file at: https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/vtr_flow/benchmarks/verilog/koios
109105

110-
$VTR_ROOT/vtr_flow/arch/COFFE_22nm/k6FracN10LB_mem20K_complexDSP_customSB_22nm.*
111106

112107
MCNC20 Benchmarks
113108
-----------------

0 commit comments

Comments
 (0)