Skip to content

Submitting Koios benchmarks #1753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 14, 2021

Conversation

aman26kbm
Copy link
Contributor

Description

Adding files related to Koios benchmarks. Here's a summary of the files:

  1. The benchmarks themselves (.v files in the benchmarks/verilog/koios directory)
  2. A README file for the benchmarks that provides a link to the documentation, how to cite and who to contact.
  3. The arch files that were used with Koios benchmarks in the arch/COFFE_22nm directory
  4. Adding a brief description of the benchmarks in the benchmarks.rst file. This goes on the docs pages.
  5. Adding a citation of the Koios paper in the bibtex file.

Motivation and Context

Adding a set of DL benchmarks for FPGA arch and CAD research

How Has This Been Tested?

I need to add tests. Will work with Vaughn on identifying the tests to add.

Types of changes

  • Bug fix (change which fixes an issue)
  • [x ] New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [x ] My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@vaughnbetz
Copy link
Contributor

Thanks Aman. Yes, a CI test should be added to vtr_reg_nightly. @ArashAhmadian is currently beefing up vtr_reg_nightly and likely refactoring it into two sub-tests (so we can run on two kokoro machines in parallel) so adding him to the conversation.

@aman26kbm
Copy link
Contributor Author

Thanks Aman. Yes, a CI test should be added to vtr_reg_nightly. @ArashAhmadian is currently beefing up vtr_reg_nightly and likely refactoring it into two sub-tests (so we can run on two kokoro machines in parallel) so adding him to the conversation.

Regarding adding tests. There is one main architecture that we've added in this PR (all others are just variations of that architecture) in the PR. And then there are 20 designs.

From a wall clock time perspective, the designs take anywhere from 2 mins to 24 hours to run. I am thinking that we add 3 tests - one in vtr_reg_strong, one in vtr_reg_nightly and one in vtr_reg_weekly. We just need to ensure that each design gets tested with one arch variation. So, we can distribute the design/arch combinations across these tests keeping in mind that the time taken by each test doesn't exceed the set limit.

@vaughnbetz
Copy link
Contributor

Yes, that makes sense. You could test all designs on one arch, and then pick one or two interesting designs and test them on all archs. Catches almost as many errors as the full cross product, but with much less cpu time.

@ArashAhmadian is in the middle of restructuring the nightly regtests to both increase their coverage and reduce their wall clock time to about four hours. So we should structure the nightly tests for this suite to also fit within four hours wall clock (using multiple cpus, and probably in a new nightly test so they get their own kokoro machine in CI). We'll have to be pragmatic about the CAD settings / circuit combinations to keep the runtime down to that.

Weekly can run any really big/slow designs, but it won't get nearly as much attention as it isn't run on each pull request.

@aman26kbm
Copy link
Contributor Author

I've added 4 tests:

  1. Test named "koios" to vtr_reg_strong. This finished in 12 mins with -j4 (satisfies the requirement of 15 mins). This covers a few small designs with the main arch file.
  2. Test named "koios" to vtr_reg_nightly. This finished in 2.8 hours with -j3 (satisfies the requirement of 4 hours). This covers some slightly larger designs with the main arch file.
  3. Test named "koios_multi_arch" to vtr_reg_nightly. This finished in 46 mins with -j3 (satisfies the requirement of 4 hours). This covers one medium-sized design with all the other arch files.
  4. Test named "koios" to vtr_reg_weekly. This hasn't finished yet, but should finish in the required time of 42 hours with -j4. This covers large to very large designs on the main arch file. I will checkin the golden results file for this soon.

@aman26kbm
Copy link
Contributor Author

aman26kbm commented Jun 5, 2021

Some designs in the suite use advanced features of the DSP slice in the architecture and instantiate the DSP slice hard macro in the RTL. But to be able to run these designs on a simpler arch (like the Stratix IV based flagship arch), we have provided behavioral code that replaces the instantiation of the DSP slice. This is handled using a macro (`define complex_dsp) in the Verilog file. So effectively, for some designs, we have two variations - one with complex_dsp and one without complex_dsp.

I think we want to protect both variations by adding tests that enable and disable the `define. How can we do that? It won't be a good idea to statically copy the design file and keep the `define complex_dsp in one file and remove the `define in the other. @sdamghan , @KennethKent , is it possible to add a feature to provide a `define from the odin config file?

@vaughnbetz
Copy link
Contributor

Thanks Aman. Adding @kmurray and @ArashAhmadian to the conversation.
Agree with the Odin-II define -- good to protect both variations. If it isn't in CI, assume it doesn't work!

I think we need to cut down what goes in vtr_reg_strong. The entire test is intended to take 15 minutes or less, rather than individual subtests taking less than 15 minutes. See https://docs.verilogtorouting.org/en/latest/dev/developing/#adding-tests -- the recommendation is to keep the test to 1 minute or less. Could just use one (smallest) koios design, run with a fixed channel width, etc. I think the key in that test is to check one of the more complex architectures/DSP blocks; we shouldn't need a big circuit for that.

The nightly and weekly additions sound good. @ArashAhmadian is refactoring nightly into multiple sub-tests so we can run on multiple kokoro machines for CI and get higher throughput. I suggest we add the koios designs into a new nightly test (e.g. vtr_reg_nightly4) so we get another kokoro machine for them, and they don't slow down nightly.

Alternatively, @ArashAhmadian : you can figure out how to update the various kokoro scripts to split up the runs in the vtr_reg_nightly task list across multiple machines (run one job per task, or run the first N on different machines and the last M on the final machine if we only have N+1 kokoro machines. This is basically Tim's suggestion from last week, except it ensures we never run out of kokoro machines (we just slow down). In this case we'd update the documentation to say slow tests should be listed first (and we'd reverse the task list processing so first gets run first).

@aman26kbm
Copy link
Contributor Author

I think we need to cut down what goes in vtr_reg_strong. The entire test is intended to take 15 minutes or less, rather than individual subtests taking less than 15 minutes

Ah I thought each tests in vtr_reg_strong was supposed to be 15 mins. I realize that none of the Koios benchmarks runs in 1 minute or less. The shortest time I see is 5 mins (on a Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz with 64GB RAM). It may run a bit faster (say 4 min) on one of the other arch variations, but it won't come down to 1 min. So, it looks like we just have to have only nightly and weekly regression tests for Koios. What we can do is to create a test.v benchmark under koios, that only instantiates say the DSP slice from the architecture and run it in vtr_reg_strong. That gives us some protection of the arch file, but not of any real koios benchmark. What do you think about this, @vaughnbetz ?

I suggest we add the koios designs into a new nightly test (e.g. vtr_reg_nightly4) so we get another kokoro machine for them, and they don't slow down nightly.

Regarding this point... @ArashAhmadian, @vaughnbetz, is there something I need to do? I don't see vtr_reg_nightly1/2/3 folders in the repo. This "vtr_reg_nightly4" abstraction is something in the CI system and not in the repo, I guess?

@aman26kbm
Copy link
Contributor Author

@vaughnbetz , while we sort the regression aspects out, can you please review and provide feedback on the other aspects of the benchmarks like the file names, folder paths, documentation, etc. I've provided a link to the koios benchmark folder (https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/vtr_flow/benchmarks/verilog/koios) in the paper; we need that to be finalized before I submit the camera-ready version of the paper.

@ArashAhmadian
Copy link
Contributor

@aman26kbm I haven't made the PR for the refactoring of vtr_reg_nightly (vtr_reg_nightly is separated into three smaller suites only on the vtr_task_reorganization branch in my fork) yet . I will make the request soon, most likely within the next two days. However, you could still add it in as "vtr_reg_nightly_test4" alongside vtr_reg_nightly but for it to run on a kokoro machine, a .cfg file has to be added under https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/.github/kokoro/presubmit. The new .cfg file, say nightly_test4.cfg will be almost identical to nightly.cfg with the only difference being the value for VTR_TEST.
(will update the doc to include these steps)

Alternatively, when finalized, you could add the discussed regression tests under the current suites (vtr_reg_nightly and/or vtr_reg_weekly), and I will refactor them into vtr_reg_nightl4 before making my PR.

Please let me know your thoughts. @aman26kbm @vaughnbetz

@sdamghan
Copy link
Member

sdamghan commented Jun 7, 2021

Some designs in the suite use advanced features of the DSP slice in the architecture and instantiate the DSP slice hard macro in the RTL. But to be able to run these designs on a simpler arch (like the Stratix IV based flagship arch), we have provided behavioral code that replaces the instantiation of the DSP slice. This is handled using a macro (`define complex_dsp) in the Verilog file. So effectively, for some designs, we have two variations - one with complex_dsp and one without complex_dsp.

I think we want to protect both variations by adding tests that enable and disable the define. How can we do that? It won't be a good idea to statically copy the design file and keep the define complex_dsp in one file and remove the `define in the other. @sdamghan , @KennethKent , is it possible to add a feature to provide a define from the odin config file?

@aman26kbm In my opinion, there would be a couple of options for this matter. First, since we can read multiple Verilog files (although without having multiple top modules) I can add a terminal argument to read define from another input file. In this way, you would handle the usage of define by passing the second Verilog file. Another option is to provide a Verilog specifier like #ifdef in C, so if a statement (#def XXX) is specified in the config file we can ignore the define macro and move forward. Please let me know your ideas, so I'll make a WIP PR soon.

@aman26kbm
Copy link
Contributor Author

@aman26kbm I haven't made the PR for the refactoring of vtr_reg_nightly (vtr_reg_nightly is separated into three smaller suites only on the vtr_task_reorganization branch in my fork) yet . I will make the request soon, most likely within the next two days. However, you could still add it in as "vtr_reg_nightly_test4" alongside vtr_reg_nightly but for it to run on a kokoro machine, a .cfg file has to be added under https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/.github/kokoro/presubmit. The new .cfg file, say nightly_test4.cfg will be almost identical to nightly.cfg with the only difference being the value for VTR_TEST.
(will update the doc to include these steps)

Alternatively, when finalized, you could add the discussed regression tests under the current suites (vtr_reg_nightly and/or vtr_reg_weekly), and I will refactor them into vtr_reg_nightl4 before making my PR.

Please let me know your thoughts. @aman26kbm @vaughnbetz

Thanks, @ArashAhmadian , I'll prefer the second option coz I'm not familiar with the regression setup. But your PR will likely get submitted before this one. This PR's review will take some time. When this PR's other aspects have been reviewed and if your PR has been checked in by then, I can change the files (to move the tests to the new regression folders) to avoid extra work for you.

@aman26kbm
Copy link
Contributor Author

aman26kbm commented Jun 7, 2021

Some designs in the suite use advanced features of the DSP slice in the architecture and instantiate the DSP slice hard macro in the RTL. But to be able to run these designs on a simpler arch (like the Stratix IV based flagship arch), we have provided behavioral code that replaces the instantiation of the DSP slice. This is handled using a macro (define complex_dsp) in the Verilog file. So effectively, for some designs, we have two variations - one with complex_dsp and one without complex_dsp. I think we want to protect both variations by adding tests that enable and disable the _define_. How can we do that? It won't be a good idea to statically copy the design file and keep the define complex_dsp in one file and remove the define in the other. @sdamghan , @KennethKent , is it possible to add a feature to provide a define from the odin config file?

@aman26kbm In my opinion, there would be a couple of options for this matter. First, since we can read multiple Verilog files (although without having multiple top modules) I can add a terminal argument to read define from another input file. In this way, you would handle the usage of define by passing the second Verilog file. Another option is to provide a Verilog specifier like #ifdef in C, so if a statement (#def XXX) is specified in the config file we can ignore the define macro and move forward. Please let me know your ideas, so I'll make a WIP PR soon.

I didn't know we could read in multiple Verilog files for one benchmark/design. That's great. I guess then I just refactor the "`define complex_dsp" line into another Verilog file and then no change should be required from you, right? I am confused about this: "I can add a terminal argument to read define from another input file".

Having said that, option #2 is better I think. I can imagine this being a common usecase. Typically in designs, there are multiple `defines that can change the behavior of the design. So the ability of adding a `define from the odin config file will be good.

@vaughnbetz may have some more inputs here.

@vaughnbetz
Copy link
Contributor

Thanks all.

  1. Aman, for vtr_reg_strong I think just having a simple instantiation of one of the complex DSP blocks to test the basic architecture is appropriate. vtr_reg_strong goes for fast tests to find gross errors; we need nightly for more subtle problems or QoR issues. So I think the best thing is to make a small architecture-test circuit in vtr_reg_strong. Second best is to simply leave testing koios entirely to the nightly run.
  2. I agree specifying 'define in the config.txt file for a regtest is a good idea. I'm not sure of what the easiest way to do that is, but at a high level I think we want to add a flow option to set a source code define in the config.txt. Right now script_params is interpreted by run_vtr_task/run_vtr_flow to set all sorts of CAD options and even flow options (which CAD tools to run, and how many times). E.g., one current example is:
    script_params = --route_chan_width 200 --gen_post_synthesis_netlist on -track_memory_usage -check_equivalent --sweep_dangling_primary_ios off --sweep_constant_primary_outputs off

We could add another option like:
script_params= -verilog_def "'define complex_dsp"
or
script_params = -add_verilog_file complex_dsp_define.v

Seyed, you'd then modify run_vtr_task/run_vtr_flow to properly paste that define into a verilog file or to include that additional verilog file. I don't have a strong opinion about implementation, although the adding a new file using the script_params setting might be more powerful (can do anything in that file). Does this look doable? Is one easier or more maintainable than the other?

@vaughnbetz
Copy link
Contributor

The vtr_reg_strong tests failed in CI. The error message are:
021-06-05T17:44:22.1765706Z [Fail]
2021-06-05T17:44:22.1767166Z k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml/spmv.v/common vpr_status Task value 'exited with return code 134' does not match golden 'success'
2021-06-05T17:44:22.1768104Z [Fail]
2021-06-05T17:44:22.1769385Z k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml/spmv.v/common num_primary_inputs relative value -0.5 outside of range [0.95,1.05] and not equal to golden value: 2.0
2021-06-05T17:44:22.1770334Z [Fail]
2021-06-05T17:44:22.1771642Z k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml/spmv.v/common num_primary_outputs relative value -0.058823529411764705 outside of range [0.95,1.05] and not equal to golden value: 17.0
2021-06-05T17:44:22.1772613Z [Fail]
2021-06-05T17:44:22.1773926Z k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml/spmv.v/common num_pre_packed_nets relative value -6.093845216331505e-05 outside of range [0.95,1.05] and not equal to golden value: 16410.0
[more snipped]

Error code 134 looks to be SIGABRT. I suspect you hit a timeout limit on vtr_reg_strong, so further motivation to cut the cpu time (a lot) added to vtr_reg_strong.

@ArashAhmadian
Copy link
Contributor

Thanks, @ArashAhmadian , I'll prefer the second option coz I'm not familiar with the regression setup. But your PR will likely get submitted before this one. This PR's review will take some time. When this PR's other aspects have been reviewed and if your PR has been checked in by then, I can change the files (to move the tests to the new regression folders) to avoid extra work for you.

Sounds good, thanks.

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and the directory structure looks fine. A few suggested commenting changes in the arch file and students to consider adding as benchmark contributors.

@sdamghan
Copy link
Member

sdamghan commented Jun 8, 2021

Thanks all.

1. Aman, for vtr_reg_strong I think just having a simple instantiation of one of the complex DSP blocks to test the basic architecture is appropriate.  vtr_reg_strong goes for fast tests to find gross errors; we need nightly for more subtle problems or QoR issues.  So I think the best thing is to make a small architecture-test circuit in vtr_reg_strong.  Second best is to simply leave testing koios entirely to the nightly run.

2. I agree specifying 'define in the config.txt file for a regtest is a good idea.  I'm not sure of what the easiest way to do that is, but at a high level I think we want to add a flow option to set a source code define in the config.txt.  Right now script_params is interpreted by run_vtr_task/run_vtr_flow to set all sorts of CAD options and even flow options (which CAD tools to run, and how many times).  E.g., one current example is:
   script_params = --route_chan_width 200 --gen_post_synthesis_netlist on -track_memory_usage -check_equivalent --sweep_dangling_primary_ios off --sweep_constant_primary_outputs off

We could add another option like:
script_params= -verilog_def "'define complex_dsp"
or
script_params = -add_verilog_file complex_dsp_define.v

Seyed, you'd then modify run_vtr_task/run_vtr_flow to properly paste that define into a verilog file or to include that additional verilog file. I don't have a strong opinion about implementation, although the adding a new file using the script_params setting might be more powerful (can do anything in that file). Does this look doable? Is one easier or more maintainable than the other?

Since we can pass multiple Verilog files to Odin I think this process is solvable without any changes to the scripts. For instance, we can have the Verilog file (.v) fixed and once extra define is needed we can pass it to Odin using a header file (.vh). In the following, I wrote an example of this scenario. (you could find more in the keywords test of Odin's regression tests.)

<HEADER FILE>
<32bit_xor.vh>
/*
 * Integer wide range test
*/

`define WIDTH 32
`define operator xor
`include "./range_any_width_binary_test.v"

<VERILOG FILE>
<range_any_width_binary_test.v>
/*
 * Generates a gate for each output (ie. out[0],out[1])
 * Tests the width of binary gates
*/

`define RANGE [`WIDTH-1:0]

module simple_op(a,b,out);
    input 	`RANGE a;
    input   `RANGE b;
    output 	`RANGE out;

    `operator(out`RANGE,a`RANGE,b`RANGE);

endmodule 

In this way, you would specify the complex_dsp macro in a header file, then using Odin's config file you would add it to the circuit list once it is needed. A script_param scenario is a reasonable option as well, please let me know if the above-mentioned approach does not work.

@vaughnbetz
Copy link
Contributor

Adding @andrewboutros.
CI status: pretty sure vtr_reg_strong timed out.
Not sure why some tests haven't launched. Should all relaunch when changes are pushed anyway.
@aman26kbm : you will need to pare down the vtr_reg_strong runs to fit in a short time.

@sdamghan : Thanks for the description of .vh files above. The part I am not sure about is how this goes through the regression test infrastructure, without Aman having to duplicate the benchmark circuit for fancy-DSP and non-DSP versions. The regtest infrastructure takes a circuit list, not a list of files within each circuit. Currently all the vtr tests I can see have one .blif or .v file per benchmark (we could generalize this, but someone would have to write/update the scripts). Can you go further in your example to show how the extra file gets included in the benchmark run from a config.txt -- i.e. exactly how is the benchmark organized and the config.txt file written to make this happen?

I am not sure of how to make this happen without adding a new option to run_vtr_task/run_vtr_flow as detailed above. I might be missing something, but I (and I think Aman) need more of a tutorial of exactly how to do it through the regtest infrastructure in that case.

@sdamghan
Copy link
Member

sdamghan commented Jun 8, 2021

Adding @andrewboutros.
CI status: pretty sure vtr_reg_strong timed out.
Not sure why some tests haven't launched. Should all relaunch when changes are pushed anyway.
@aman26kbm : you will need to pare down the vtr_reg_strong runs to fit in a short time.

@sdamghan : Thanks for the description of .vh files above. The part I am not sure about is how this goes through the regression test infrastructure, without Aman having to duplicate the benchmark circuit for fancy-DSP and non-DSP versions. The regtest infrastructure takes a circuit list, not a list of files within each circuit. Currently all the vtr tests I can see have one .blif or .v file per benchmark (we could generalize this, but someone would have to write/update the scripts). Can you go further in your example to show how the extra file gets included in the benchmark run from a config.txt -- i.e. exactly how is the benchmark organized and the config.txt file written to make this happen?

I am not sure of how to make this happen without adding a new option to run_vtr_task/run_vtr_flow as detailed above. I might be missing something, but I (and I think Aman) need more of a tutorial of exactly how to do it through the regtest infrastructure in that case.

Based on my recent view of Koios benchmark Verilog files, the definition of complex_dsp macro in each test almost differs from the others. Concerning the -add_define or -add_verilog_file to script_params, we have two different scenarios for run_vtr_task and run_vtr_flow.
For run_vtr_flow, it is pretty straightforward since we run the entire flow for one circuit at a time. However, for run_vtr_task, it is a bit different since we have a list of Verilog files that each of them may need a different definition of complex_dsp macro. If we want to add such an option to script params, we need to specify which define or Verilog file belongs to which circuit. Please let me know your ideas on this matter.
I have created a PR in Aman's repository. A brief description of what I have done is:

  • I moved all Koios Verilog files to a sub-directory called generic_circuits ==> /benchmarks/verilog/koios/generic_circuits

  • For those designs that had complex_dsp macro defined, I created a .vh file in /benchmarks/verilog/koios including the definition of the complex_dsp plus an include statement to the related Verilog file. To be clear there is an example in follows:

    /*
     * tiny_darknet_like.small design including complex dsp definition
    */
    
    `define complex_dsp
    module td_fused_top_Block_entry_proc_proc392 (
            ap_clk,
            ap_rst,
            ap_start,
            ap_done,
            ap_continue,
            ap_idle,
            ap_ready,
            tmp,
            ap_return
    );
    
    `include "./generic_circuits/tiny_darknet_like.small.v"
    
  • I modified the config files for the regression tests to run Koios tests based on whether they are running with flagship architectures or complex DSP slice ones.

To recap, I think by adding an option like mentioned to script params, or Odin config file, we will have to create more files to have the definition of complex_dsp macro in addition to running VTR task would be more complicated since we need to specify the relation of extra Verilog files (definitions of complex_dsp) with the original circuits.

Link the PR: https://github.com/aman26kbm/vtr-verilog-to-routing/pull/4/files


__________New Koios Directory_______________|________________vtr_reg_strong config file______________________
/benchmarks/verilog/koios/:                 | #   
- tiny_darknet_like.small_complex_dsp.vh    | ############################################
- softmax_complex_dsp.vh                    | # Configuration file for running experiments
- robot_rl_complex_dsp.vh                   | ##############################################
- gemm_layer_complex_dsp.vh                 | 
- eltwise_layer_complex_dsp.vh              | # Path to directory of circuits to use
- dla_like.small_complex_dsp.vh             | circuits_dir=benchmarks/verilog/koios
- dla_like.medium_complex_dsp.vh            | 
- conv_layer_complex_dsp.vh                 | # Path to directory of architectures to use
- attention_layer_complex_dsp.vh            | archs_dir=arch/COFFE_22nm
- README.md                                 | 
                                            | # Add circuits to list to sweep
        - attention_layer.v                 | circuit_list_add=eltwise_layer_complex_dsp.vh
        - bnn.v                             | circuit_list_add=generic_circuits/robot_rl.v
        - clstm_like.large.v                | circuit_list_add=softmax_complex_dsp.vh
        - clstm_like.medium.v               | circuit_list_add=generic_circuits/reduction_layer.v
        - clstm_like.small.v                | circuit_list_add=generic_circuits/spmv.v
        - conv_layer.v                      | 
        - conv_layer_hls.v                  | # Add architectures to list to sweep
        - dla_like.medium.v                 | arch_list_add=k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml
        - dla_like.small.v                  | 
        - eltwise_layer.v                   | # Parse info and how to parse
        - gemm_layer.v                      | parse_file=vpr_standard.txt
        - lstm.v                            | 
        - reduction_layer.v                 | # How to parse QoR info
        - robot_rl.v                        | qor_parse_file=qor_standard.txt
        - softmax.v                         | 
        - spmv.v                            | # Pass requirements
        - tiny_darknet_like.medium.v        | pass_requirements_file=pass_requirements.txt
        - tiny_darknet_like.small.v         | 
        - tpu_like.medium.v                 | #Script parameters
        - tpu_like.small.v                  | script_params=-track_memory_usage -crit_path_router_iterations 100 --route_chan_width 300
                      

@aman26kbm
Copy link
Contributor Author

aman26kbm commented Jun 9, 2021

Sorry for the delay. I was tied up in some other work.

@sdamghan , thanks for the details. Basically the idea (let's call it method #2) is to have a set of "generic" circuits and then have a set of "complex_dsp" circuits that just define the complex_dsp macro and `include the "generic" circuit file. Right?

A few things to mention...

  1. I see that in your modifications in some files you've left a little bit more code than just the `define and the `include. Example is the code in the previous comment related to tiny_darknet. This will create a problem. This means that the "generic" file is no longer a complete benchmark. So, in the "complex_dsp" circuit, we should have just two things: the `define complex_dsp and the `include of the "generic" circuit. [This is a minor change; not an argument against this method]

  2. How does the file search for `include work with ODIN? Does the path in `include have to be relative to the directory in which the file is in? Or does it have to be relative to the directory in which ODIN will actually run (which is generated at run time). From the changes you've made, it looks like a path relative to the directory in which the file is will work. But I wanted to confirm this. [This question applies to the other method mentioned by Dr. Betz as well]

  3. This approach ensures that we don't have to copy the entire file. We just have two files but one of them is really a very tiny file. That's good. [This advantage applies to the other method mentioned by Dr. Betz as well]

In the approach Dr. Betz suggested (let's call it method #1), I agree that if complex_dsp is to be defined, it will be defined for all designs in a task. But that's okay. `defining complex_dsp for a design that doesn't use the complex_dsp macro is fine. It's just a NOP. Doesn't do anything. So, method #1 also satisfies our requirements.

The main difference between the two methods is that method #1 is more dynamic. In method #1, we will have multiple "generic" circuit files and an additional file that will just have the `define (can have many other `defines or other stuff). This additional file will not have the `include of the design file. We could potentially have more than one additional files with different `define settings. So, when running we would just provide the additional file in the task config through the additional command line arg (to script_params). This file will apply to all files in the task.

Method #2 is more static. If we have different `defines to provide, we will need to create more files. One file per benchmark (even if the same `define is to be applied to multiple files). And also it will involve more changes to the task file (changing the file name in multiple places) So, it's a little more work.

I prefer Method #1 over Method #2. Please provide your thoughts.

@vaughnbetz , acking the review comments. I am planning to work on them today, including adding a dummy/test file in vtr_reg_strong. Most likely, I will submit a commit today to addresses these.

@sdamghan
Copy link
Member

sdamghan commented Jun 9, 2021

Sorry for the delay. I was tied up in some other work.

@sdamghan , thanks for the details. Basically the idea (let's call it method #2) is to have a set of "generic" circuits and then have a set of "complex_dsp" circuits that just define the complex_dsp macro and `include the "generic" circuit file. Right?

A few things to mention...

1. I see that in your modifications in some files you've left a little bit more code than just the `define and the `include. Example is the code in the previous comment related to tiny_darknet. This will create a problem. This means that the "generic" file is no longer a complete benchmark. So, in the "complex_dsp" circuit, we should have just two things: the `define complex_dsp and the `include of the "generic" circuit. [This is a minor change; not an argument against this method]

2. How does the file search for `include work with ODIN? Does the path in `include have to be relative to the directory in which the file is in? Or does it have to be relative to the directory in which ODIN will actually run (which is generated at run time). From the changes you've made, it looks like a path relative to the directory in which the file is will work. But I wanted to confirm this. [This question applies to the other method mentioned by Dr. Betz as well]

3. This approach ensures that we don't have to copy the entire file. We just have two files but one of them is really a very tiny file. That's good. [This advantage applies to the other method mentioned by Dr. Betz as well]

In the approach Dr. Betz suggested (let's call it method #1), I agree that if complex_dsp is to be defined, it will be defined for all designs in a task. But that's okay. `defining complex_dsp for a design that doesn't use the complex_dsp macro is fine. It's just a NOP. Doesn't do anything. So, method #1 also satisfies our requirements.

The main difference between the two methods is that method #1 is more dynamic. In method #1, we will have multiple "generic" circuit files and an additional file that will just have the define (can have many other defines or other stuff). This additional file will not have the include of the design file. We could potentially have more than one additional files with different define settings. So, when running we would just provide the additional file in the task config through the additional command line arg (to script_params). This file will apply to all files in the task.

Method #2 is more static. If we have different defines to provide, we will need to create more files. One file per benchmark (even if the same define is to be applied to multiple files). And also it will involve more changes to the task file (changing the file name in multiple places) So, it's a little more work.

I prefer Method #1 over Method #2. Please provide your thoughts.

@vaughnbetz , acking the review comments. I am planning to work on them today, including adding a dummy/test file in vtr_reg_strong. Most likely, I will submit a commit today to addresses these.

@aman26kbm thanks for the clarification.
Concerning question 2, the include path is relative to the file given to Odin. (the generic circuit path should be relative to the path of .vh file path)
I couldn't agree more on the efficiency of method #1, If benchmarks can use a single definition for complex_dsp macro method #1 definitely makes sense. So, I will make a PR on this matter today.

@vaughnbetz
Copy link
Contributor

Thanks for the detailed explanation @sdamghan . I agree that approach (#2) can also work. We'd wind up with longer circuit lists in the config.txt so we tested both the with complex_dsp and without complex_dsp cases, but that's OK. It would let us test both variants in one config.txt if we wanted, and if not, we'd just list the changed circuits in the second (complex_dsp) run.

If we use the 'define passed to run_vtr_task/run_vtr_flow, we should still cut down the circuit list in the run that sets the define, as it is a waste of cpu to run the designs that don't have complex dsps again (just get the same answer as in the generic run).

So I'm OK with either approach. Approach #2 doesn't require infrastructure changes (I think), while approach #1 does, but gives us a dynamic behaviour that could be useful if we later wind up with combinations of defines. I'll let those doing the actual work make the decision :).

2. Adding a new test design that we can use in the vtr_reg_strong regression test.
3. Moved all tests from vtr_reg_strong to vtr_reg_nighty. The time taken by the tests in vtr_reg_strong was more than we want.
4. Updated the list of contributors to include Helen and Zach who worked on DLA and CLSTM designs in the past.
@aman26kbm
Copy link
Contributor Author

Hi @ArashAhmadian , I guess the PR you're working on is this: #1728, right?
I've subscribed to it so I'll know when it gets submitted. After that, I'll add commits to this Koios PR to change the nightly tests to vtr_reg_nightly_test4.

@ArashAhmadian
Copy link
Contributor

Hi @ArashAhmadian , I guess the PR you're working on is this: #1728, right?
I've subscribed to it so I'll know when it gets submitted. After that, I'll add commits to this Koios PR to change the nightly tests to vtr_reg_nightly_test4.

Hi @aman26kbm. Yes, that is the PR. I have made a PR (#1776) to update the doc to mention the parallelism strategy and give detailed steps for adding new nightly suites (waiting for review). Feel free to refer to it once #1728 has landed.

Let me know your thoughts.

@github-actions github-actions bot removed lang-make CMake/Make code build Build system docs Documentation VPR VPR FPGA Placement & Routing Tool Odin Odin II Logic Synthesis Tool: Unsorted item labels Jun 11, 2021
Adding step to gzip results from vtr_reg_nightly_test4 to vtr-test.sh file.
@aman26kbm
Copy link
Contributor Author

I have added changes that move koios benchmark nightly tests to vtr_reg_nightly_test4.

@ArashAhmadian , can you please review my changes?

@aman26kbm aman26kbm marked this pull request as ready for review June 11, 2021 19:08
Copy link
Contributor

@ArashAhmadian ArashAhmadian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, does koios_multi_arch take longer than koios? We should have the task with the longest flow runs at the bottom of the list so it starts at the beginning of the CI run.

@aman26kbm
Copy link
Contributor Author

Just to make sure, does koios_multi_arch take longer than koios? We should have the task with the longest flow runs at the bottom of the list so it starts at the beginning of the CI run.

Oh.. koios_multi_arch is shorter than koios. Lemme change the order and submit.

…e task with the longest flow runs at the bottom of the list so it starts at the beginning of the CI run. In this case, koios task takes much longer than koios_multi_arch.
@ArashAhmadian
Copy link
Contributor

Just to make sure, does koios_multi_arch take longer than koios? We should have the task with the longest flow runs at the bottom of the list so it starts at the beginning of the CI run.

Oh.. koios_multi_arch is shorter than koios. Lemme change the order and submit.

Sounds good. Files (configs and task lists) are read in backwards by run_reg_test.py which is a bit unintuitive.
Also, having the long flow runs at the bottom of your config.txt is preferred so that they actually start at the very beginning.

…k is now in the Koios benchmarks (named gemm_layer.v).

Also, removing the synthesis results for 3 benchmarks (tpu.16x16, tpu.32x32 and matmul_8x8_fp16) from ODIN-II synthesis regression golden results.
These benchmarks are no longer present in the vtr_flow/benchmarks/verilog directory. They are not in the Koios suite under different names.
@github-actions github-actions bot added the Odin Odin II Logic Synthesis Tool: Unsorted item label Jun 12, 2021
@aman26kbm
Copy link
Contributor Author

@sdamghan , in ODIN_II/regression_test/benchmark/task/large/synthesis_result.json, there were synthesis results for 3 benchmarks: tpu.16x16, tpu.32x32 and matmul_8x8_fp16. These benchmarks have now been removed from vtr_flow/benchmarks/verilog directory and exist under vtr_flow/benchmarks/verilog/koios under slightly different names. So, I've removed the synthesis results from the file mentioned above.

My question is: do we want to add separate ODIN regression tests for Koios benchmarks in ODIN_II/regression_test/benchmark/task?

@sdamghan
Copy link
Member

sdamghan commented Jun 12, 2021

@sdamghan , in ODIN_II/regression_test/benchmark/task/large/synthesis_result.json, there were synthesis results for 3 benchmarks: tpu.16x16, tpu.32x32 and matmul_8x8_fp16. These benchmarks have now been removed from vtr_flow/benchmarks/verilog directory and exist under vtr_flow/benchmarks/verilog/koios under slightly different names. So, I've removed the synthesis results from the file mentioned above.

My question is: do we want to add separate ODIN regression tests for Koios benchmarks in ODIN_II/regression_test/benchmark/task?

@aman26kbm That would be great if we have a task specifically for the Koios benchmarks in the ODIN_II regression test as well. The ODIN_II task config file is pretty much like the VTR flow.
Just to make sure everything is OK, we have a script called ./verify_odin.sh in the ODIN_II directory. So, if you run this script for the large benchmark using --regenerate_expectation, it should automatically remove the aforementioned benchmarks from synthesis and simulation results.

@aman26kbm
Copy link
Contributor Author

Ok. Thanks. I've added koios benchmarks (only the ones that are in VTR nightly) to ODIN's heavy suite (the same suite which had the tpu and gemm designs).

@aman26kbm
Copy link
Contributor Author

@vaughnbetz , can you please approve this PR? The only pending item is to add tests for the designs without the "complex_dsp". I am going to do that as a separate PR.

@vaughnbetz
Copy link
Contributor

@aman26kbm : can you summarize the wall clock time and cpu time that's been added to the strong, nightly, and weekly tests? It's good to summarize to make sure it won't cause us any problem, and in case we have any throughput issues we can look it up in this PR.
vtr_reg_nightly4 didn't run in CI. Does @mithro need to do something on his end to make it run?

@aman26kbm
Copy link
Contributor Author

@aman26kbm : can you summarize the wall clock time and cpu time that's been added to the strong, nightly, and weekly tests? It's good to summarize to make sure it won't cause us any problem, and in case we have any throughput issues we can look it up in this PR.
vtr_reg_nightly4 didn't run in CI. Does @mithro need to do something on his end to make it run?

Just looked at the logs.
Strong was 6 seconds.
Weekly was 87101 seconds (that's a little over 24 hours).
For nightly, the time is around 5 hours. But these results are stale. The configuration has been modified after that. I'll rerun it and report back.

@vaughnbetz
Copy link
Contributor

Thanks. Strong definitely sounds fine :).
For weekly, is that a single cpu number, or a wall clock number on multiple cpus?

@aman26kbm
Copy link
Contributor Author

Thanks. Strong definitely sounds fine :).
For weekly, is that a single cpu number, or a wall clock number on multiple cpus?

For weekly, I used "-j 4" as mentioned in the developer README. So, the time reported is the wall clock time taken when running 4 jobs in parallel (on 4 cpus on a multi-cpu machine).
For nightly, it's "-j 3".

@vaughnbetz
Copy link
Contributor

Thanks Aman. I think that'll be OK as the weekly throughput isn't as critical. This is worth posting as a table in the Thursday meeting, and @ArashAhmadian if you could give the numbers for the current nightly and weekly for comparison that would be ideal.

@vaughnbetz vaughnbetz merged commit 35f90c9 into verilog-to-routing:master Jun 14, 2021
@aman26kbm
Copy link
Contributor Author

@aman26kbm : can you summarize the wall clock time and cpu time that's been added to the strong, nightly, and weekly tests? It's good to summarize to make sure it won't cause us any problem, and in case we have any throughput issues we can look it up in this PR.
vtr_reg_nightly4 didn't run in CI. Does @mithro need to do something on his end to make it run?

Just looked at the logs.
Strong was 6 seconds.
Weekly was 87101 seconds (that's a little over 24 hours).
For nightly, the time is around 5 hours. But these results are stale. The configuration has been modified after that. I'll rerun it and report back.

I ran nightly again. It finished in 2 hours. The large difference compared to the older result of 5 hours is because I had moved one long test into the weekly a few commit ago. So, we are good.

The regression time will increase when I add the tests for designs without complex_dsp. But I think we will still be under the prescribed limits for each suite.

@aman26kbm
Copy link
Contributor Author

Thanks Aman. I think that'll be OK as the weekly throughput isn't as critical. This is worth posting as a table in the Thursday meeting, and @ArashAhmadian if you could give the numbers for the current nightly and weekly for comparison that would be ideal.

I'll post the data for the various suites used for Koios in the README of these benchmarks. I need to anyway add some information to the README about the new usage of complex_dsp (by adding the additional defines file). I'll do that with a new PR.

@aman26kbm aman26kbm mentioned this pull request Jun 23, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra Project Infrastructure Odin Odin II Logic Synthesis Tool: Unsorted item
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants