Skip to content

Koios benchmarks (2nd PR) #1787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 27 additions & 4 deletions ODIN_II/regression_test/benchmark/task/koios/task.conf
Original file line number Diff line number Diff line change
@@ -1,20 +1,29 @@
########################
# large benchmarks config
# Koios benchmarks config
########################

regression_params=--disable_simulation --disable_parallel_jobs --verbose
script_synthesis_params=--limit_ressource --time_limit 14400s
script_simulation_params=--limit_ressource --time_limit 14400s

# setup the architecture
#-------------------------------------------------------
# specify the directory to look for architecture file in
#-------------------------------------------------------
archs_dir=../vtr_flow/arch/COFFE_22nm

# one arch allows it to run faster given it is single threaded
#-------------------------------------------------------
# specify the architecture file
#-------------------------------------------------------
arch_list_add=k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml

#-------------------------------------------------------
# specify the directory to look for benchmarks in
#-------------------------------------------------------
circuits_dir=../../../../vtr_flow/benchmarks/verilog/koios

# glob the large benchmark and the vtr ones to prevent duplicate run
#-------------------------------------------------------
# specify the benchmarks
#-------------------------------------------------------
circuit_list_add=tpu_like.small.v
circuit_list_add=dla_like.small.v
circuit_list_add=bnn.v
Expand All @@ -28,4 +37,18 @@ circuit_list_add=reduction_layer.v
circuit_list_add=spmv.v
circuit_list_add=softmax.v

#-------------------------------------------------------
# specify the directory to look for include file in
#-------------------------------------------------------
includes_dir=../../../../vtr_flow/benchmarks/verilog/koios

#-------------------------------------------------------
# specify the include files
#-------------------------------------------------------
# Some benchmarks instantiate complex dsp blocks to implement features
# like native floating point math, cascade chains, etc. This functionality
# is guarded under the `complex_dsp` macro. The complex_dsp_include.v file
# defines this macro, thereby enabling instantiations of the complex dsp.
include_list_add=complex_dsp_include.v

synthesis_parse_file=regression_test/parse_result/conf/synth.toml
42 changes: 42 additions & 0 deletions README.developers.md
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,48 @@ stratixiv_arch.timing.xml stereo_vision_stratixiv_arch_timing.blif 0208312
stratixiv_arch.timing.xml cholesky_mc_stratixiv_arch_timing.blif 0208312 success 140214 108592 67410 5444 121 90 -1 111 151 -1 -1 5221059 8.16972 -454610 -8.16972 1518597 15 0 0 2.38657e+08 21915.3 9.34704 -531231 -9.34704 0 0 211.12 364.32 490.24 6356252 -1 -1
```

### Example: Koios Benchmarks QoR Measurement

The [Koios benchmarks](https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/vtr_flow/benchmarks/verilog/koios) are a group of Deep Learning benchmark circuits distributed with the VTR project.
The are provided as synthesizable verilog and can be re-mapped to VTR supported architectures. They consist mostly of medium to large sized circuits from Deep Learning (DL).
They can be used for FPGA architecture exploration for DL and also for tuning CAD tools.

A typical approach to evaluating an algorithm change would be to run `koios` (or `koios_no_complex_dsp`) task from the nightly regression test (vtr_reg_nightly_test4) and the `koios` (or `koios_no_complex_dsp`) task from the weekly regression test (vtr_reg_weekly). The nightly test contains smaller benchmarks, whereas the large designs are in the weekly regression test. To measure QoR for the entire benchmark suite, both nightly and weekly tests should be run and the results should be concatenated.

The `koios` regression task runs these benchmarks with complex_dsp functionality enabled, whereas `koios_no_complex_dsp` regression task runs these benchmarks without complex_dsp functionality. Normally, only the `koios` tasks should be enough for QoR.

The following steps show a sequence of commands to run the `koios` tasks on the Koios benchmarks from both nightly and weekly regressions:

```shell
#From the VTR root
$ cd vtr_flow/tasks

#Run the VTR benchmarks
$ ../scripts/run_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios &
$ ../scripts/run_vtr_task.py regression_tests/vtr_reg_weekly/koios &

#Several hours later... they complete

#Parse the results
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios

#The run directory should now contain a summary parse_results.txt file
$ head -5 vtr_reg_nightly_test4/koios/<latest_run_dir>/parse_results.txt
arch circuit script_params vtr_flow_elapsed_time error odin_synth_time max_odin_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_time placed_wirelength_est place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml tpu_like.small.v common 2871.10 9.36 235096 5 619.21 -1 -1 159760 -1 -1 1119 355 14 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 2568860 355 289 50215 41827 2 23224 2053 136 136 18496 dsp_top auto 1233.72 457725 91.70 0.38 7.24742 -105267 -7.24742 2.59789 14.13 0.101267 0.0738583 24.91 18.6865 -1 561916 17 5.92627e+08 1.03195e+08 4.09037e+08 22114.9 16.37 32.3744 25.1979 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml dla_like.small.v common 7527.41 42.24 729876 5 3941.31 -1 -1 630244 -1 -1 5545 194 828 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 4409476 194 13 217044 174718 1 91037 6708 164 164 26896 memory auto 1604.22 969627 663.41 2.84 5.61569 -424718 -5.61569 5.61569 21.49 0.584073 0.385993 104.796 73.1698 -1 1450542 14 8.6211e+08 3.01197e+08 5.93540e+08 22068.0 53.97 132.203 96.049 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml bnn.v common 2028.52 40.37 577472 3 240.94 -1 -1 513656 -1 -1 5695 260 0 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 2195980 260 122 231647 179602 1 86181 6140 83 83 6889 clb auto 613.32 940951 503.35 2.87 6.4402 -131403 -6.4402 6.4402 5.41 0.753268 0.564332 85.331 60.8639 -1 1224690 16 2.13666e+08 1.74902e+08 1.51359e+08 21971.1 50.49 114.382 84.8538 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml attention_layer.v common 1330.99 11.83 1095592 7 59.16 -1 -1 560612 -1 -1 1248 1058 161 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 1180420 1058 16 47407 39134 1 26605 2588 86 86 7396 dsp_top auto 728.70 234151 118.11 0.71 5.89837 -78343.6 -5.89837 5.89837 6.64 0.181478 0.146942 31.9659 24.5807 -1 366899 17 2.32446e+08 8.36361e+07 1.62201e+08 21930.9 16.25 40.6352 32.1556 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

$ head -5 vtr_reg_weekly/koios/<latest_run_dir>/parse_results.txt
arch circuit script_params vtr_flow_elapsed_time error odin_synth_time max_odin_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_time placed_wirelength_est place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.small.v common 19316.21 162.86 2395612 3 15651.23 -1 -1 1337084 -1 -1 9309 293 407 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 4793656 293 290 374045 333664 1 99618 10661 152 152 23104 dsp_top auto 780.11 1482046 931.17 6.18 6.93257-474705 -6.93257 6.93257 27.96 1.30491 1.10886 193.621 147.02 -1 1762769 16 7.41832e+08 4.07655e+08 5.09972e+08 22072.9 106.48 263.016 205.56 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.medium.v common 66156.51 483.84 4490632 3 60586.75 -1 -1 2480684 -1 -1 17641 293 784 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 8819072 293 578 696105 624609 1 181497 19958 206 206 42436 dsp_top auto 1057.61 3181171 1581.42 7.89 8.18417-1.24303e+06 -8.18417 8.18417 38.05 1.40496 0.993779 233.672 173.509 -1 3532357 18 1.36407e+09 7.68183e+08 9.34572e+08 22023.1 165.78 317.679 246.098 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.large.v common 101695.32 717.44 6547568 3 94597.42 -1 -1 3590044 -1 -1 25995 293 1161 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 13040176 293 866 1018158 915547 1 263328 29277 248 248 61504 dsp_top auto 1503.62 5185716 1951.96 10.18 8.86387-1.97516e+06 -8.86387 8.86387 46.08 1.71937 1.21506 288.67 213.718 -1 5523775 18 1.98856e+09 1.12933e+09 1.35238e+09 21988.4 224.14 391.303 303.49 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml lstm.v common 38901.96 35.76 651868 7 30532.88 -1 -1 606240 -1 -1 6626 17 305 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 6036204 17 19 252939 204226 1 121211 7577 200 200 40000 dsp_top auto 4576.24 1453809 1136.46 3.95 8.38544 -386636 -8.385448.38544 54.87 0.944433 0.763732 237.758 176.282 -1 1876011 15 1.28987e+09 3.81683e+08 8.80433e+08 22010.877.53 284.979 214.902 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
```

## Comparing QoR Measurements
Once you have two (or more) sets of QoR measurements they now need to be compared.

Expand Down
7 changes: 1 addition & 6 deletions doc/src/vtr/benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,17 +97,12 @@ These designs use many precisions including binary, different fixed point types
eltwise_layer Matrix elementwise add/sub/mult
================= ======================================

Koios benchmarks are fully compatible with the full VTR flow. Some Koios benchmarks use advanced DSP features that are available in only a few FPGA architectures provided with VTR. This is because they instantiate DSP macros to implement native FP16 multiplications or use the hard dedicated chains, and these are architecture-specific. If users want to use a different FPGA architecture file, they can replace the macro instantiations in the benchmarks with their equivalents from the FPGA architectures they wish to use.

Alternatively, users can disable these advanced features. The macro ``complex_dsp`` can be used for this purpose. If complex_dsp is defined in a benchmark file (using ```define complex_dsp`` in the beginning of the benchmark file), then advanced DSP features mentioned above will be used. If a user wants to run a Koios benchmark with FPGA architectures that don't have these advanced DSP features (for example, the flagship architectures: ``$VTR_ROOT/vtr_flow/arch/timing/k6_frac_N10_*_mem32K_40nm*``), then they can remove the line defining the complex_dsp macro. This enables the same functionality with behavioral Verilog that is mapped to the FPGA soft logic when an architecture without the required macro definitions is used.

The VTR benchmarks are provided as Verilog (enabling full flexibility to modify and change how the designs are implemented) under: ::

$VTR_ROOT/vtr_flow/benchmarks/verilog/koios

The FPGA architectures with advanced DSP that work out-of-the-box with Koios benchmarks are available here: ::
To use these benchmarks, please see the documentation in the README file at: https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/vtr_flow/benchmarks/verilog/koios

$VTR_ROOT/vtr_flow/arch/COFFE_22nm/k6FracN10LB_mem20K_complexDSP_customSB_22nm.*

MCNC20 Benchmarks
-----------------
Expand Down
Loading