-
Notifications
You must be signed in to change notification settings - Fork 415
Adding a TPU-like design to the VTR benchmark suite #1573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a TPU-like design to the VTR benchmark suite #1573
Conversation
…nchmarking. The two designs use the same skeleton. The main difference is that one of them has a 16x16 matrix multiplication unit, while the other one has a 32x32 unit.
@vaughnbetz , please review this PR and provide feedback. |
This will break Odin full regression in its current state because of the wildcard we use on the VTR flow verilog directory But full regression in only ran on buildbot, Kokoro doesnt run any Odin test as far as I can tell from the cfg, so it should not block you with the CI. I'll generate the expected results for the Odin regression suite and do a PR onto yours to not break the flow. |
Odin: regenerate expectations
Thanks, @jeanlego |
Can someone approve this PR? Please let me know if there is something I need to do. |
Once kokoro run It cn be merge on my end (@vaughnbetz ?) |
@aman26kbm is it blocking anything? |
Thanks for checking, @jeanlego . This isn't blocking anything. So, we can check it in later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look like very nice designs to me Aman -- thanks!
@jeanlego : I think the Odin-II tests are OK now -- is that right?
Aman, I see you've added new tests for these designs but I think they're not run in any of our current CI. Is that right?
Also, I see you're adding them as new VTR benchmarks in the documentation -- I have some questions below on what's the best way to "productize" these benchmarks.
I think these can be merged, as all my comments are really enhancements and questions on what's next.
I'm not sure what the right approach in terms of benchmark suite is.
0. Do you have a testbench for these designs? If so, useful to check it in too. We don't have testbenches for most circuits, but good to keep them if we have them.
-
We could add these designs to the existing VTR suite. In that case I think we should add them to CI for VTR designs by adding them to the benchmark list, after confirming that they don't take forever to do a binary search place and route. If binary search place and route takes forever, we could add them to a new VTR at fixed channel width test.
-
Or we could start a new ML benchmark suite, with these being the first entries. I tend to think 1 is better for now, as these designs go through ODIN-II and hence are more like the existing VTR designs (rather than the Titan flow designs).
// Matrix multiplication unit | ||
//////////////////////////////////// | ||
|
||
////////////////////////////////////////////////////////////////////////////////// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should either fill in as much of this as possible (engineer, company can be institution, etc.) or delete this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will edit and push a new change over the weekend
@andrewboutros : adding Andrew. |
Thanks for the review, @vaughnbetz . Please find comments/questions below.
Actually I didn't any tests. I assume you're referring to the changes in .json file in the PR? @jeanlego modified those because ODIN tests were picking up the new files.
I do have a simple verilog testbench with a few tests that test the main features of the design. I can add that. Is there a recommended directory structure for this? I like to keep the design and the testbench together. So, then each benchmark would become a sub-directory containing design and testbench files.
The time taken by the designs for the total VTR flow for 2 architectures is in the banner in the files. The 16x16 design should be okay to add to the CI. It takes <1 hour. Not sure what the current limits are, for something to be in the CI regression. The fixed-width part probably runs very quickly. I can find that time and report it here.
We can have a non-Titan flow ML benchmark suite as well. In fact, I think a non-Titan flow ML benchmark suite will have more general use, no? |
#1: For tests, I suggest
#2: if we make each benchmark a sub-directory I am pretty sure it will break the existing vtr_flow scripts that will run the regtests above. That could be fixed, but the faster way to check in the testbenches would be to make a /testbenches directory under the vtr_flow/benchmarks/verilog directory and put a README and testbenches (or subdirectories with testbenches) under it. #3: Seems like these should be added to the benchmarks/verilog directory and run as part of the vtr suite during QoR runs then. #4: I think we can make a benchmarks/ml directory and just put a README in it pointing to the designs in benchmarks/verilog and later titan/. That way we don't have to add a lot of new tests, and we can run these new designs alongside the existing Odin-II compatible designs (with our current scripts) and average QoR across the larger suite. Adding @MohamedElgammal and @kmurray in case they have any thoughts. |
correct I do believe they are a nice addition to the benchmarks, but I don't see them as essential to run with the Odin regression, I leave that up to you. FYI we glob the VTR benchmark directory to make sure all the tests available synthesize as expected and there are no changes on the circuits, hence the JSON I generated. |
Discussed this with some other contributors (google etc.) in a VTR meeting today. They like the plan above (put in vtr_flow/benchmarks/verilog, add to list of benchmarks in the qor sweeps etc.). |
Sounds good. I have added the sub-tasks here on my task list. Will post updates/ask questions as I work through them. |
I could merge this PR and then we could track the update tasks in #1 - 4 above in a separate PR if you like Aman. Let me know if you'd like that, or if you'd rather do everything on this PR. |
Sounds good. Yes, please go ahead and create a separate issue and track the updates there. |
Thanks, @vaughnbetz |
Description
This PR adds a new design to the VTR benchmark suite. The design is a Google TPU-like design and is a good representative of Deep Learning (DL) applications.
There are two designs actually - one that uses a 32x32 matrix multiplication unit and another that uses a 16x16 matrix multiplication unit.
There is a large comment section on the top of each file, which mentions the FPGA architectures that these designs have been tested with, the clock frequency/critical path obtained on each architecture and also the resource usage.
Related Issue
No issue exists for this. This was discussed in email threads with @vaughnbetz .
Motivation and Context
In the current VTR benchmark suite, we don't have any benchmarks that are DL/ML related. Adding a DL/ML design will be helpful for researchers aiming to explore FPGA architectures for DL/ML applications.
How Has This Been Tested?
Adding these doesn't modify an existing functionality. So, I haven't run any existing tests. I've run these VTR flow with these designs on the FPGA architectures specified in the files.
Types of changes
Checklist: