@@ -629,31 +629,43 @@ stratixiv_arch.timing.xml cholesky_mc_stratixiv_arch_timing.blif 0208312
629
629
### Example: Koios Benchmarks QoR Measurement
630
630
631
631
The [ Koios benchmarks] ( https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/vtr_flow/benchmarks/verilog/koios ) are a group of Deep Learning benchmark circuits distributed with the VTR project.
632
- The are provided as synthesizable verilog and can be re-mapped to VTR supported architectures.
633
- They consist mostly of medium to large sized circuits from Deep Learning (DL).
632
+ The are provided as synthesizable verilog and can be re-mapped to VTR supported architectures. They consist mostly of medium to large sized circuits from Deep Learning (DL).
634
633
They can be used for FPGA architecture exploration for DL and also for tuning CAD tools.
635
634
636
- A typical approach to evaluating an algorithm change would be to run ` koios ` or ` koios_no_complex_dsp ` task from the nightly regression test (vtr_reg_nightly_test4) and the ` koios ` or ` koios_no_complex_dsp ` task from the weekly regression test (vtr_reg_weekly). The nightly test contains smaller benchmarks, whereas the large designs are in the weekly regression test. The following steps show an example sequence of commands.
635
+ A typical approach to evaluating an algorithm change would be to run ` koios ` (or ` koios_no_complex_dsp ` ) task from the nightly regression test (vtr_reg_nightly_test4) and the ` koios ` (or ` koios_no_complex_dsp ` ) task from the weekly regression test (vtr_reg_weekly). The nightly test contains smaller benchmarks, whereas the large designs are in the weekly regression test. To measure QoR for the entire benchmark suite, both nightly and weekly tests should be run and the results should be concatenated.
636
+
637
+ The ` koios ` regression task runs these benchmarks with complex_dsp functionality enabled, whereas ` koios_no_complex_dsp ` regression task runs these benchmarks without complex_dsp functionality. Normally, only the ` koios ` tasks should be enough for QoR.
638
+
639
+ The following steps show a sequence of commands to run the ` koios ` tasks on the Koios benchmarks from both nightly and weekly regressions:
637
640
638
641
``` shell
639
642
# From the VTR root
640
643
$ cd vtr_flow/tasks
641
644
642
645
# Run the VTR benchmarks
643
- $ ../scripts/run_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios
646
+ $ ../scripts/run_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios &
647
+ $ ../scripts/run_vtr_task.py regression_tests/vtr_reg_weekly/koios &
644
648
645
649
# Several hours later... they complete
646
650
647
651
# Parse the results
648
652
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios
653
+ $ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios
649
654
650
655
# The run directory should now contain a summary parse_results.txt file
651
- $ head -5 vtr_reg_nightly_test4/koios/latest/parse_results.txt
652
- arch circuit script_params vtr_flow_elapsed_time error odin_synth_time max_odin_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_time placed_wirelength_est place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time
653
- k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml tpu_like.small.v common 2871.10 9.36 235096 5 619.21 -1 -1 159760 -1 -1 1119 355 14 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 2568860 355 289 50215 41827 2 23224 2053 136 136 18496 dsp_top auto 1233.72 457725 91.70 0.38 7.24742 -105267 -7.24742 2.59789 14.13 0.101267 0.0738583 24.91 18.6865 -1 561916 17 5.92627e+08 1.03195e+08 4.09037e+08 22114.9 16.37 32.3744 25.1979 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
654
- k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml dla_like.small.v common 7527.41 42.24 729876 5 3941.31 -1 -1 630244 -1 -1 5545 194 828 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 4409476 194 13 217044 174718 1 91037 6708 164 164 26896 memory auto 1604.22 969627 663.41 2.84 5.61569 -424718 -5.61569 5.61569 21.49 0.584073 0.385993 104.796 73.1698 -1 1450542 14 8.6211e+08 3.01197e+08 5.93540e+08 22068.0 53.97 132.203 96.049 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
655
- k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml bnn.v common 2028.52 40.37 577472 3 240.94 -1 -1 513656 -1 -1 5695 260 0 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 2195980 260 122 231647 179602 1 86181 6140 83 83 6889 clb auto 613.32 940951 503.35 2.87 6.4402 -131403 -6.4402 6.4402 5.41 0.753268 0.564332 85.331 60.8639 -1 1224690 16 2.13666e+08 1.74902e+08 1.51359e+08 21971.1 50.49 114.382 84.8538 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
656
- k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml attention_layer.v common 1330.99 11.83 1095592 7 59.16 -1 -1 560612 -1 -1 1248 1058 161 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 1180420 1058 16 47407 39134 1 26605 2588 86 86 7396 dsp_top auto 728.70 234151 118.11 0.71 5.89837 -78343.6 -5.89837 5.89837 6.64 0.181478 0.146942 31.9659 24.5807 -1 366899 17 2.32446e+08 8.36361e+07 1.62201e+08 21930.9 16.25 40.6352 32.1556 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
656
+ $ head -5 vtr_reg_nightly_test4/koios/< latest_run_dir> /parse_results.txt
657
+ arch circuit script_params vtr_flow_elapsed_time error odin_synth_time max_odin_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_time placed_wirelength_est place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time
658
+ k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml tpu_like.small.v common 2871.10 9.36 235096 5 619.21 -1 -1 159760 -1 -1 1119 355 14 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 2568860 355 289 50215 41827 2 23224 2053 136 136 18496 dsp_top auto 1233.72 457725 91.70 0.38 7.24742 -105267 -7.24742 2.59789 14.13 0.101267 0.0738583 24.91 18.6865 -1 561916 17 5.92627e+08 1.03195e+08 4.09037e+08 22114.9 16.37 32.3744 25.1979 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
659
+ k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml dla_like.small.v common 7527.41 42.24 729876 5 3941.31 -1 -1 630244 -1 -1 5545 194 828 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 4409476 194 13 217044 174718 1 91037 6708 164 164 26896 memory auto 1604.22 969627 663.41 2.84 5.61569 -424718 -5.61569 5.61569 21.49 0.584073 0.385993 104.796 73.1698 -1 1450542 14 8.6211e+08 3.01197e+08 5.93540e+08 22068.0 53.97 132.203 96.049 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
660
+ k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml bnn.v common 2028.52 40.37 577472 3 240.94 -1 -1 513656 -1 -1 5695 260 0 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 2195980 260 122 231647 179602 1 86181 6140 83 83 6889 clb auto 613.32 940951 503.35 2.87 6.4402 -131403 -6.4402 6.4402 5.41 0.753268 0.564332 85.331 60.8639 -1 1224690 16 2.13666e+08 1.74902e+08 1.51359e+08 21971.1 50.49 114.382 84.8538 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
661
+ k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml attention_layer.v common 1330.99 11.83 1095592 7 59.16 -1 -1 560612 -1 -1 1248 1058 161 -1 success v8.0.0-4161-g8f4b3e9ca release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-05-28T23:09:34 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 1180420 1058 16 47407 39134 1 26605 2588 86 86 7396 dsp_top auto 728.70 234151 118.11 0.71 5.89837 -78343.6 -5.89837 5.89837 6.64 0.181478 0.146942 31.9659 24.5807 -1 366899 17 2.32446e+08 8.36361e+07 1.62201e+08 21930.9 16.25 40.6352 32.1556 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
662
+
663
+ $ head -5 vtr_reg_weekly/koios/< latest_run_dir> /parse_results.txt
664
+ arch circuit script_params vtr_flow_elapsed_time error odin_synth_time max_odin_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_time placed_wirelength_est place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time
665
+ k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.small.v common 19316.21 162.86 2395612 3 15651.23 -1 -1 1337084 -1 -1 9309 293 407 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 4793656 293 290 374045 333664 1 99618 10661 152 152 23104 dsp_top auto 780.11 1482046 931.17 6.18 6.93257-474705 -6.93257 6.93257 27.96 1.30491 1.10886 193.621 147.02 -1 1762769 16 7.41832e+08 4.07655e+08 5.09972e+08 22072.9 106.48 263.016 205.56 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
666
+ k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.medium.v common 66156.51 483.84 4490632 3 60586.75 -1 -1 2480684 -1 -1 17641 293 784 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 8819072 293 578 696105 624609 1 181497 19958 206 206 42436 dsp_top auto 1057.61 3181171 1581.42 7.89 8.18417-1.24303e+06 -8.18417 8.18417 38.05 1.40496 0.993779 233.672 173.509 -1 3532357 18 1.36407e+09 7.68183e+08 9.34572e+08 22023.1 165.78 317.679 246.098 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
667
+ k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml clstm_like.large.v common 101695.32 717.44 6547568 3 94597.42 -1 -1 3590044 -1 -1 25995 293 1161 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 13040176 293 866 1018158 915547 1 263328 29277 248 248 61504 dsp_top auto 1503.62 5185716 1951.96 10.18 8.86387-1.97516e+06 -8.86387 8.86387 46.08 1.71937 1.21506 288.67 213.718 -1 5523775 18 1.98856e+09 1.12933e+09 1.35238e+09 21988.4 224.14 391.303 303.49 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
668
+ k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml lstm.v common 38901.96 35.76 651868 7 30532.88 -1 -1 606240 -1 -1 6626 17 305 -1 success v8.0.0-4470-ge625fdfe9 release IPO VTR_ASSERT_LEVEL=2 GNU 7.5.0 on Linux-4.15.0-124-generic x86_64 2021-06-18T14:02:07 jupiter0 /export/aman/vtr_aman/vtr-verilog-to-routing/vtr_flow/tasks 6036204 17 19 252939 204226 1 121211 7577 200 200 40000 dsp_top auto 4576.24 1453809 1136.46 3.95 8.38544 -386636 -8.385448.38544 54.87 0.944433 0.763732 237.758 176.282 -1 1876011 15 1.28987e+09 3.81683e+08 8.80433e+08 22010.877.53 284.979 214.902 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
657
669
```
658
670
659
671
## Comparing QoR Measurements
0 commit comments