Adding a TPU-like design to the VTR benchmark suite #1573

aman26kbm · 2020-10-14T05:28:59Z

Description

This PR adds a new design to the VTR benchmark suite. The design is a Google TPU-like design and is a good representative of Deep Learning (DL) applications.

There are two designs actually - one that uses a 32x32 matrix multiplication unit and another that uses a 16x16 matrix multiplication unit.

There is a large comment section on the top of each file, which mentions the FPGA architectures that these designs have been tested with, the clock frequency/critical path obtained on each architecture and also the resource usage.

Related Issue

No issue exists for this. This was discussed in email threads with @vaughnbetz .

Motivation and Context

In the current VTR benchmark suite, we don't have any benchmarks that are DL/ML related. Adding a DL/ML design will be helpful for researchers aiming to explore FPGA architectures for DL/ML applications.

How Has This Been Tested?

Adding these doesn't modify an existing functionality. So, I haven't run any existing tests. I've run these VTR flow with these designs on the FPGA architectures specified in the files.

Types of changes

Bug fix (change which fixes an issue)
[ x] New feature (change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ x] My change requires a change to the documentation
[ x] I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

…nchmarking. The two designs use the same skeleton. The main difference is that one of them has a 16x16 matrix multiplication unit, while the other one has a 32x32 unit.

aman26kbm · 2020-10-14T05:29:45Z

@vaughnbetz , please review this PR and provide feedback.

jeanlego · 2020-10-17T21:35:06Z

This will break Odin full regression in its current state because of the wildcard we use on the VTR flow verilog directory

But full regression in only ran on buildbot, Kokoro doesnt run any Odin test as far as I can tell from the cfg, so it should not block you with the CI.

I'll generate the expected results for the Odin regression suite and do a PR onto yours to not break the flow.

Odin: regenerate expectations

aman26kbm · 2020-10-18T16:39:44Z

Thanks, @jeanlego

aman26kbm · 2020-10-25T16:36:47Z

Can someone approve this PR? Please let me know if there is something I need to do.

jeanlego · 2020-10-26T19:33:56Z

Once kokoro run It cn be merge on my end (@vaughnbetz ?)

jeanlego · 2020-10-28T16:03:02Z

@aman26kbm is it blocking anything?

aman26kbm · 2020-10-28T16:32:53Z

Thanks for checking, @jeanlego . This isn't blocking anything. So, we can check it in later.

vaughnbetz

Look like very nice designs to me Aman -- thanks!
@jeanlego : I think the Odin-II tests are OK now -- is that right?
Aman, I see you've added new tests for these designs but I think they're not run in any of our current CI. Is that right?
Also, I see you're adding them as new VTR benchmarks in the documentation -- I have some questions below on what's the best way to "productize" these benchmarks.
I think these can be merged, as all my comments are really enhancements and questions on what's next.

I'm not sure what the right approach in terms of benchmark suite is.
0. Do you have a testbench for these designs? If so, useful to check it in too. We don't have testbenches for most circuits, but good to keep them if we have them.

We could add these designs to the existing VTR suite. In that case I think we should add them to CI for VTR designs by adding them to the benchmark list, after confirming that they don't take forever to do a binary search place and route. If binary search place and route takes forever, we could add them to a new VTR at fixed channel width test.
Or we could start a new ML benchmark suite, with these being the first entries. I tend to think 1 is better for now, as these designs go through ODIN-II and hence are more like the existing VTR designs (rather than the Titan flow designs).

vaughnbetz · 2020-10-28T16:41:48Z

vtr_flow/benchmarks/verilog/tpu.32x32.int8.v

+// Matrix multiplication unit
+////////////////////////////////////
+
+//////////////////////////////////////////////////////////////////////////////////


Probably should either fill in as much of this as possible (engineer, company can be institution, etc.) or delete this.

Will edit and push a new change over the weekend

vaughnbetz · 2020-10-28T19:04:31Z

@andrewboutros : adding Andrew.

aman26kbm · 2020-10-29T06:10:06Z

Thanks for the review, @vaughnbetz . Please find comments/questions below.

#1

I see you've added new tests for these designs but I think they're not run in any of our current CI. Is that right?

Actually I didn't any tests. I assume you're referring to the changes in .json file in the PR? @jeanlego modified those because ODIN tests were picking up the new files.
I assume a "test" involves running the VTR flow with different settings/configurations/architectures. And adding a test for the design will "protect" the design from changes to ensure it always works with the flow. I am not familiar with the testing flow we have for VTR. I can read up on it and see which testlist we can add this to.

#2

Do you have a testbench for these designs? If so, useful to check it in too.

I do have a simple verilog testbench with a few tests that test the main features of the design. I can add that. Is there a recommended directory structure for this? I like to keep the design and the testbench together. So, then each benchmark would become a sub-directory containing design and testbench files.

#3

We could add these designs to the existing VTR suite. In that case I think we should add them to CI for VTR designs by adding them to the benchmark list, after confirming that they don't take forever to do a binary search place and route.

The time taken by the designs for the total VTR flow for 2 architectures is in the banner in the files. The 16x16 design should be okay to add to the CI. It takes <1 hour. Not sure what the current limits are, for something to be in the CI regression. The fixed-width part probably runs very quickly. I can find that time and report it here.

#4

Or we could start a new ML benchmark suite, with these being the first entries. I tend to think 1 is better for now, as these designs go through ODIN-II and hence are more like the existing VTR designs (rather than the Titan flow designs).

We can have a non-Titan flow ML benchmark suite as well. In fact, I think a non-Titan flow ML benchmark suite will have more general use, no?
For now, we only have one design (TPU-like design with 2 variants), but we'll have more soon. So, we can create a new directory called "ml" under "benchmarks", and put these designs there.

vaughnbetz · 2020-10-29T14:01:36Z

#1: For tests, I suggest

Add both designs to the list of benchmarks run in the /scripts/run_vtr_task.py regression_tests/vtr_reg_nightly/vtr_reg_qor_chain regression. It will do a binary search routing and it sounds like both should run in a reasonable time (good to test this though). See https://docs.verilogtorouting.org/en/latest/dev/developing/#example-vtr-benchmarks-qor-measurement for info on this test.
Make another regtest that runs at least one of these designs on the 22 nm COFFE architecture you created, to ensure it still works and to check it is within QoR bounds. The vtr_reg_qor_chain regression should be a good template, and the procedure is documented at https://docs.verilogtorouting.org/en/latest/dev/developing/#adding-tests

#2: if we make each benchmark a sub-directory I am pretty sure it will break the existing vtr_flow scripts that will run the regtests above. That could be fixed, but the faster way to check in the testbenches would be to make a /testbenches directory under the vtr_flow/benchmarks/verilog directory and put a README and testbenches (or subdirectories with testbenches) under it.

#3: Seems like these should be added to the benchmarks/verilog directory and run as part of the vtr suite during QoR runs then.

#4: I think we can make a benchmarks/ml directory and just put a README in it pointing to the designs in benchmarks/verilog and later titan/. That way we don't have to add a lot of new tests, and we can run these new designs alongside the existing Odin-II compatible designs (with our current scripts) and average QoR across the larger suite.
Basically this groups designs by the flow they go through, and then READMEs/lists define subsets and combinations of interest. I think that will be the more maintainable solution.

Adding @MohamedElgammal and @kmurray in case they have any thoughts.

jeanlego · 2020-10-29T15:39:05Z

@jeanlego : I think the Odin-II tests are OK now -- is that right?

correct

I do believe they are a nice addition to the benchmarks, but I don't see them as essential to run with the Odin regression, I leave that up to you.

FYI we glob the VTR benchmark directory to make sure all the tests available synthesize as expected and there are no changes on the circuits, hence the JSON I generated.

vaughnbetz · 2020-10-29T18:54:05Z

Discussed this with some other contributors (google etc.) in a VTR meeting today. They like the plan above (put in vtr_flow/benchmarks/verilog, add to list of benchmarks in the qor sweeps etc.).

aman26kbm · 2020-10-29T23:59:05Z

Sounds good. I have added the sub-tasks here on my task list. Will post updates/ask questions as I work through them.

vaughnbetz · 2020-11-08T15:06:45Z

I could merge this PR and then we could track the update tasks in #1 - 4 above in a separate PR if you like Aman. Let me know if you'd like that, or if you'd rather do everything on this PR.

aman26kbm · 2020-11-08T15:22:55Z

I think let's merge this one and we can handle #1-#4 in another PR. I have added those to my task list. May be I can create an issue and track those tasks there?

vaughnbetz · 2020-11-08T15:28:24Z

Sounds good. Yes, please go ahead and create a separate issue and track the updates there.

aman26kbm · 2020-11-08T16:39:35Z

Thanks, @vaughnbetz
I've filed #1590.
Please assign it to me if possible. It doesn't seem like I can do that myself.

aman26kbm added 2 commits October 14, 2020 00:20

Adding new benchmark (TPU) to documentation

cf16710

Adding two TPU-like designs as benchmarks. They can be used for ML be…

9216935

…nchmarking. The two designs use the same skeleton. The main difference is that one of them has a 16x16 matrix multiplication unit, while the other one has a 32x32 unit.

jeanlego and others added 2 commits October 18, 2020 00:54

Odin: regenerate expectations

0a66ee5

Merge pull request #3 from jeanlego/vtr_benchmark_tpu

c115a00

Odin: regenerate expectations

jeanlego added kokoro:force-run and removed kokoro:force-run labels Oct 26, 2020

vaughnbetz approved these changes Oct 28, 2020

View reviewed changes

vaughnbetz merged commit a9402fc into verilog-to-routing:master Nov 8, 2020

aman26kbm mentioned this pull request Nov 8, 2020

Updates related to adding TPU-like design to VTR benchmarks #1590

Closed

aman26kbm deleted the vtr_benchmark_tpu branch November 8, 2020 16:39

aman26kbm restored the vtr_benchmark_tpu branch November 8, 2020 16:39

Adding a TPU-like design to the VTR benchmark suite #1573

Adding a TPU-like design to the VTR benchmark suite #1573

Uh oh!

Conversation

aman26kbm commented Oct 14, 2020

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

aman26kbm commented Oct 14, 2020

Uh oh!

jeanlego commented Oct 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aman26kbm commented Oct 18, 2020

Uh oh!

aman26kbm commented Oct 25, 2020

Uh oh!

jeanlego commented Oct 26, 2020

Uh oh!

jeanlego commented Oct 28, 2020

Uh oh!

aman26kbm commented Oct 28, 2020

Uh oh!

vaughnbetz left a comment

Choose a reason for hiding this comment

Uh oh!

vaughnbetz Oct 28, 2020

Choose a reason for hiding this comment

Uh oh!

aman26kbm Oct 29, 2020

Choose a reason for hiding this comment

Uh oh!

vaughnbetz commented Oct 28, 2020

Uh oh!

aman26kbm commented Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaughnbetz commented Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeanlego commented Oct 29, 2020

Uh oh!

vaughnbetz commented Oct 29, 2020

Uh oh!

aman26kbm commented Oct 29, 2020

Uh oh!

vaughnbetz commented Nov 8, 2020

Uh oh!

aman26kbm commented Nov 8, 2020

Uh oh!

vaughnbetz commented Nov 8, 2020

Uh oh!

aman26kbm commented Nov 8, 2020

Uh oh!

Uh oh!

jeanlego commented Oct 17, 2020 •

edited

Loading

aman26kbm commented Oct 29, 2020 •

edited

Loading

vaughnbetz commented Oct 29, 2020 •

edited

Loading