Skip to content

Add picoSoC and Murax benchmarks #1055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

acomodi
Copy link
Collaborator

@acomodi acomodi commented Nov 25, 2019

Description

This PR adds the PicoSoC and Murax circuits to the vtr_flow strong regression tests suite.
The .blif file have been pre-compiled through Quartus Prime (v19.1) following this instructions.

Related Issue

#584
#582

Motivation and Context

SymbiFlow has murax and picosoc as part of the test suite. It is helpful to have these designs included in the VtR test suite as well, to have a comparison also with the titan flow (which uses the stratixIV architecture).

How Has This Been Tested?

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@probot-autolabeler probot-autolabeler bot added lang-netlist tests VTR Flow VTR Design Flow (scripts/benchmarks/architectures) labels Nov 25, 2019
@acomodi
Copy link
Collaborator Author

acomodi commented Nov 25, 2019

@acomodi acomodi changed the title Add picosoc murax benchmarks Add picoSoC and Murax benchmarks Nov 25, 2019
@kmurray
Copy link
Contributor

kmurray commented Nov 25, 2019

It may be better to just include these directly in the Titan benchmark release (rather than manually checking them in here).

@acomodi Can you send me the Quartus projects you created for these designs?

@acomodi
Copy link
Collaborator Author

acomodi commented Nov 25, 2019

@kmurray Sure, no problem

@kmurray
Copy link
Contributor

kmurray commented Nov 25, 2019

Thanks @acomodi for getting these benchmarks together!

As of f265da9 I've added them to the 'titan_other' benchmark set, and included them in the nightly 'titan_other' regression test set.

I'm going to close this PR since we're bringing them in via the titan benchmarks.

@kmurray kmurray closed this Nov 25, 2019
@mithro
Copy link
Contributor

mithro commented Nov 25, 2019

@kmurray - Just FYI These benchmarks should run in under <10m (really should be under 1m).

@mithro
Copy link
Contributor

mithro commented Nov 25, 2019

@acomodi Could you dump a summary of the resource usage that Quartus printed for this design?

@mithro
Copy link
Contributor

mithro commented Nov 25, 2019

FYI - olofk/edalize#78

@kmurray
Copy link
Contributor

kmurray commented Nov 25, 2019

I've got some of those numbers when run with VPR (with Quartus synthesis) for reference on our Stratix IV architecture model:

Murax:

Circuit Statistics:
  Blocks: 2142
    .input                                                                                                   :      18
    .output                                                                                                  :      17
    0-LUT                                                                                                    :       2
    6-LUT                                                                                                    :       1
    dffeas                                                                                                   :     998
    stratixiv_lcell_comb                                                                                     :     994
    stratixiv_ram_block.opmode{dual_port}.output_type{comb}.port_a_address_width{10}.port_b_address_width{10}:      32
    stratixiv_ram_block.opmode{dual_port}.output_type{comb}.port_a_address_width{4}.port_b_address_width{4}  :      16
    stratixiv_ram_block.opmode{dual_port}.output_type{comb}.port_a_address_width{5}.port_b_address_width{5}  :      64
  Nets  : 2291
    Avg Fanout:     4.2
    Max Fanout:  1222.0
    Min Fanout:     1.0
  Netlist Clocks: 1

Picosoc:

Circuit Statistics:
  Blocks: 16357
    .input                                                                                                 :      18
    .output                                                                                                :      17
    0-LUT                                                                                                  :       2
    dffeas                                                                                                 :    9644
    stratixiv_lcell_comb                                                                                   :    6580
    stratixiv_ram_block.opmode{dual_port}.output_type{comb}.port_a_address_width{5}.port_b_address_width{5}:      64
    stratixiv_ram_block.opmode{rom}.output_type{comb}.port_a_address_width{10}                             :      32
  Nets  : 16969
    Avg Fanout:     3.7
    Max Fanout:  9804.0
    Min Fanout:     1.0
  Netlist Clocks: 1

My initial observation of picosoc is that it looks like Quartus isn't inferring RAMs for some of the design logic as the VPR log shows many logic blocks being used to implement things named 'mem[XX][YY]':

Complex block 506: 'picosoc_noflash:soc|picosoc_mem:memory|mem[60][22]' (LAB) ....................
Complex block 507: 'picosoc_noflash:soc|picosoc_mem:memory|mem[43][22]' (LAB) ....................
Complex block 508: 'picosoc_noflash:soc|picosoc_mem:memory|mem[172][22]' (LAB) ....................

Here is a summary of the VPR run-times (at astar_fac=1.0, inner_num=2):
Murax:

# Loading Architecture Description
# Loading Architecture Description took 0.51 seconds (max_rss 81.8 MiB, delta_rss +69.8 MiB)
# Building complex block graph
# Building complex block graph took 7.07 seconds (max_rss 745.9 MiB, delta_rss +664.2 MiB)
# Load circuit
# Load circuit took 0.06 seconds (max_rss 745.9 MiB, delta_rss +0.0 MiB)
# Clean circuit
# Clean circuit took 0.00 seconds (max_rss 745.9 MiB, delta_rss +0.0 MiB)
# Compress circuit
# Compress circuit took 0.00 seconds (max_rss 745.9 MiB, delta_rss +0.0 MiB)
# Verify circuit
# Verify circuit took 0.00 seconds (max_rss 745.9 MiB, delta_rss +0.0 MiB)
# Build Timing Graph
# Build Timing Graph took 0.01 seconds (max_rss 745.9 MiB, delta_rss +0.0 MiB)
# Load Timing Constraints
# Load Timing Constraints took 0.00 seconds (max_rss 745.9 MiB, delta_rss +0.0 MiB)
# Packing
# Packing took 3.86 seconds (max_rss 752.0 MiB, delta_rss +6.0 MiB)
# Load Packing
Finished loading packed FPGA netlist file (took 0.144125 seconds).
# Load Packing took 0.16 seconds (max_rss 753.7 MiB, delta_rss +1.7 MiB)
# Create Device
## Build Device Grid
## Build Device Grid took 0.00 seconds (max_rss 754.0 MiB, delta_rss +0.0 MiB)
## Build routing resource graph
## Build routing resource graph took 0.63 seconds (max_rss 757.9 MiB, delta_rss +3.9 MiB)
# Create Device took 0.63 seconds (max_rss 757.9 MiB, delta_rss +3.9 MiB)
# Placement
## Computing placement delta delay look-up
### Computing router lookahead map
### Computing router lookahead map took 1.24 seconds (max_rss 757.9 MiB, delta_rss +0.0 MiB)
### Computing delta delays
### Computing delta delays took 0.04 seconds (max_rss 757.9 MiB, delta_rss +0.0 MiB)
## Computing placement delta delay look-up took 1.29 seconds (max_rss 757.9 MiB, delta_rss +0.0 MiB)
# Placement took 2.60 seconds (max_rss 760.4 MiB, delta_rss +2.5 MiB)
# Routing
# Routing took 0.80 seconds (max_rss 773.7 MiB, delta_rss +5.5 MiB)
Timing analysis took 0.966793 seconds (0.865093 STA, 0.1017 slack) (138 full updates: 124 setup, 0 hold, 14 combined).
The entire flow of VPR took 16.16 seconds (max_rss 774.3 MiB)

Picosoc:

# Loading Architecture Description
# Loading Architecture Description took 0.53 seconds (max_rss 82.0 MiB, delta_rss +69.9 MiB)
# Building complex block graph
# Building complex block graph took 7.02 seconds (max_rss 746.1 MiB, delta_rss +664.0 MiB)
# Load circuit
# Load circuit took 0.27 seconds (max_rss 764.9 MiB, delta_rss +18.8 MiB)
# Clean circuit
# Clean circuit took 0.01 seconds (max_rss 764.9 MiB, delta_rss +0.0 MiB)
# Compress circuit
# Compress circuit took 0.02 seconds (max_rss 764.9 MiB, delta_rss +0.0 MiB)
# Verify circuit
# Verify circuit took 0.01 seconds (max_rss 764.9 MiB, delta_rss +0.0 MiB)
# Build Timing Graph
# Build Timing Graph took 0.12 seconds (max_rss 773.0 MiB, delta_rss +8.1 MiB)
# Load Timing Constraints
# Load Timing Constraints took 0.00 seconds (max_rss 773.0 MiB, delta_rss +0.0 MiB)
# Packing
# Packing took 36.09 seconds (max_rss 894.4 MiB, delta_rss +121.4 MiB)
# Load Packing
Finished loading packed FPGA netlist file (took 0.97722 seconds).
# Load Packing took 1.02 seconds (max_rss 903.4 MiB, delta_rss +9.0 MiB)
# Create Device
## Build Device Grid
## Build Device Grid took 0.01 seconds (max_rss 903.7 MiB, delta_rss +0.0 MiB)
## Build routing resource graph
## Build routing resource graph took 2.91 seconds (max_rss 920.9 MiB, delta_rss +17.2 MiB)
# Create Device took 2.94 seconds (max_rss 920.9 MiB, delta_rss +17.2 MiB)
# Placement
## Computing placement delta delay look-up
### Computing router lookahead map
### Computing router lookahead map took 10.56 seconds (max_rss 920.9 MiB, delta_rss +0.0 MiB)
### Computing delta delays
### Computing delta delays took 3.22 seconds (max_rss 920.9 MiB, delta_rss +0.0 MiB)
## Computing placement delta delay look-up took 13.80 seconds (max_rss 920.9 MiB, delta_rss +0.0 MiB)
# Placement took 29.35 seconds (max_rss 920.9 MiB, delta_rss +0.0 MiB)
# Routing
# Routing took 6.38 seconds (max_rss 981.7 MiB, delta_rss +16.2 MiB)
Timing analysis took 7.38887 seconds (6.4274 STA, 0.961471 slack) (132 full updates: 117 setup, 0 hold, 15 combined).
The entire flow of VPR took 84.87 seconds (max_rss 998.2 MiB)

@mithro
Copy link
Contributor

mithro commented Nov 25, 2019

I would expect PicoSoC to have <2k flip flops, so I believe you are correct;

    dffeas                                                                                                 :    9644

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-netlist tests VTR Flow VTR Design Flow (scripts/benchmarks/architectures)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants