Skip to content

Commit 896f75f

Browse files
author
Vaughn Betz
committed
Updating k6_N10_40nm.xml and adding k6_N10_sparse_crossbar_40nm.xml. These architectures are now more heavily commented and suitable for work in a grad course (ECE 1756). I cut the logic block areas to something more reasonable for a simple architecture like this; the area numbers for the logic blocks and the local mux delays for the sparse architecture are based on coarse scaling / guessing so they aren't extremely accurate.
1 parent ed4a43a commit 896f75f

File tree

2 files changed

+451
-75
lines changed

2 files changed

+451
-75
lines changed

vtr_flow/arch/timing/k6_N10_40nm.xml

Lines changed: 103 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,38 @@
44
- 40 nm technology
55
- General purpose logic block:
66
K = 6, N = 10
7-
- Routing architecture: L = 4, fc_in = 0.15, Fc_out = 0.1
7+
- Routing architecture: L = 4, fc_in = 0.15, fc_out = 0.15
8+
- Unidirectional (mux-based) routing
9+
810
911
Details on Modelling:
1012
1113
Based on flagship k6_frac_N10_mem32K_40nm.xml architecture. This architecture has no fracturable LUTs nor any heterogeneous blocks.
12-
14+
The delays and areas are based on a mix of values from commercial 40 nm
15+
FPGAs with a comparable architecture and 40 nm interconnect and
16+
transistor models.
1317
1418
Authors: Jason Luu, Jeff Goeders, Vaughn Betz
1519
-->
1620
<architecture>
1721
<!--
1822
ODIN II specific config begins
19-
Describes the types of user-specified netlist blocks (in blif, this corresponds to
20-
".model [type_of_block]") that this architecture supports.
21-
22-
Note: Basic LUTs, I/Os, and flip-flops are not included here as there are
23-
already special structures in blif (.names, .input, .output, and .latch)
24-
that describe them.
23+
This part of the architecture file describes the "primitives"
24+
that exist in a device to the synthesis tool used to "elaborate"
25+
verilog into these primitives (which is called ODIN-II).
26+
Basic LUTs, I/Os and FFs are built into the language used by this
27+
flow (blif keywords .names, .input, .output and .latch), so they
28+
don't have to be described here.
29+
30+
For this lab you are also given the benchmark netlists after
31+
synthesis is complete (in the blif directory), so you don't need
32+
to run ODIN II.
2533
-->
2634
<models>
2735
</models>
36+
<!-- ODIN II specific config ends -->
37+
38+
<!-- Descritions of the physical tiles that exist on the die begins -->
2839
<tiles>
2940
<tile name="io" area="0">
3041
<sub_tile name="io" capacity="8">
@@ -34,7 +45,15 @@
3445
<input name="outpad" num_pins="1"/>
3546
<output name="inpad" num_pins="1"/>
3647
<clock name="clock" num_pins="1"/>
37-
<fc in_type="frac" in_val="0.15" out_type="frac" out_val="0.10"/>
48+
<fc in_type="frac" in_val="0.15" out_type="frac" out_val="0.15"/>
49+
<!-- IOs go on the periphery of the FPGA in this
50+
architecture. Since I don't want to define four
51+
different physical I/Os for the left, right, top,
52+
and bottom sides just say each pin of the I/O
53+
block is accessible from all four sides so we can
54+
reach routing channels on some side of the block
55+
no matter which side of the chip we're on.
56+
-->
3857
<pinlocations pattern="custom">
3958
<loc side="left">io.outpad io.inpad io.clock</loc>
4059
<loc side="top">io.outpad io.inpad io.clock</loc>
@@ -43,21 +62,42 @@
4362
</pinlocations>
4463
</sub_tile>
4564
</tile>
46-
<tile name="clb" area="53894">
65+
66+
<!-- Define general purpose logic block (CLB) begin -->
67+
<!-- Area below is for everything inside the
68+
logic block (LUTs, FFs, intra-cluster
69+
routing). It's a bit on the low side given the large crossbars in this
70+
architecture - more appropriate for a lower-cost
71+
FPGA with smaller transistors and narrower metal.
72+
-->
73+
<tile name="clb" area="18000">
74+
<!-- We can place a clustered block of type clb on a tile location
75+
of type clb.
76+
-->
4777
<sub_tile name="clb">
4878
<equivalent_sites>
4979
<site pb_type="clb" pin_mapping="direct"/>
5080
</equivalent_sites>
81+
82+
<!-- We have a full crossbar between the cluster inputs and the
83+
LUT inputs, so the router can route to *any* input or from
84+
*any* output on the logic block. Hence mark the logic block
85+
inputs as fully logically equivalent (swappable by the router) and also the
86+
logic block outputs as logically equivalent, which means
87+
they can also be swapped by the router.
88+
-->
89+
5190
<input name="I" num_pins="40" equivalent="full"/>
5291
<output name="O" num_pins="10" equivalent="instance"/>
5392
<clock name="clk" num_pins="1"/>
54-
<fc in_type="frac" in_val="0.15" out_type="frac" out_val="0.10"/>
93+
<fc in_type="frac" in_val="0.15" out_type="frac" out_val="0.15"/>
5594
<pinlocations pattern="spread"/>
5695
</sub_tile>
5796
</tile>
5897
</tiles>
59-
<!-- ODIN II specific config ends -->
60-
<!-- Physical descriptions begin -->
98+
<!-- Physical tile descriptions end -->
99+
100+
<!-- Chip layout (in terms of where tiles are) begins -->
61101
<layout>
62102
<auto_layout aspect_ratio="1.0">
63103
<!--Perimeter of 'io' blocks with 'EMPTY' blocks at corners-->
@@ -67,22 +107,11 @@
67107
<fill type="clb" priority="10"/>
68108
</auto_layout>
69109
</layout>
110+
<!-- Chip layout ends -->
111+
112+
<!-- Electrical and inter-cluster (general) routing description begins -->
70113
<device>
71-
<!-- VB & JL: Using Ian Kuon's transistor sizing and drive strength data for routing, at 40 nm. Ian used BPTM
72-
models. We are modifying the delay values however, to include metal C and R, which allows more architecture
73-
experimentation. We are also modifying the relative resistance of PMOS to be 1.8x that of NMOS
74-
(vs. Ian's 3x) as 1.8x lines up with Jeff G's data from a 45 nm process (and is more typical of
75-
45 nm in general). I'm upping the Rmin_nmos from Ian's just over 6k to nearly 9k, and dropping
76-
RminW_pmos from 18k to 16k to hit this 1.8x ratio, while keeping the delays of buffers approximately
77-
lined up with Stratix IV.
78-
We are using Jeff G.'s capacitance data for 45 nm (in tech/ptm_45nm).
79-
Jeff's tables list C in for transistors with widths in multiples of the minimum feature size (45 nm).
80-
The minimum contactable transistor is 2.5 * 45 nm, so I need to multiply drive strength sizes in this file
81-
by 2.5x when looking up in Jeff's tables.
82-
The delay values are lined up with Stratix IV, which has an architecture similar to this
83-
proposed FPGA, and which is also 40 nm
84-
C_ipin_cblock: input capacitance of a track buffer, which VPR assumes is a single-stage
85-
4x minimum drive strength buffer. -->
114+
<!-- Some area and timing parameters -->
86115
<sizing R_minW_nmos="8926" R_minW_pmos="16067"/>
87116
<!-- The grid_logic_tile_area below will be used for all blocks that do not explicitly set their own (non-routing)
88117
area; set to 0 since we explicitly set the area of all blocks currently in this architecture file.
@@ -92,49 +121,61 @@
92121
<x distr="uniform" peak="1.000000"/>
93122
<y distr="uniform" peak="1.000000"/>
94123
</chan_width_distr>
124+
125+
<!-- Define the switch block pattern (pattern of switches between inter-tile routing wires)
126+
The Wilton switch block is a sample pattern; you can use custom switch blocks for more control -->
95127
<switch_block type="wilton" fs="3"/>
128+
129+
<!-- Set which switch to use for input connection blocks. Only affects timing and area, not connectivity -->
96130
<connection_block input_switch_name="ipin_cblock"/>
97131
</device>
98132
<switchlist>
99133
<!-- VB: the mux_trans_size and buf_size data below is in minimum width transistor *areas*, assuming the purple
100134
book area formula. This means the mux transistors are about 5x minimum drive strength.
101135
We assume the first stage of the buffer is 3x min drive strength to be reasonable given the large
102-
mux transistors, and this gives a reasonable stage ratio of a bit over 5x to the second stage. We assume
103-
the n and p transistors in the first stage are equal-sized to lower the buffer trip point, since it's fed
104-
by a pass transistor mux. We can then reverse engineer the buffer second stage to hit the specified
105-
buf_size (really buffer area) - 16.2x minimum drive nmos and 1.8*16.2 = 29.2x minimum drive.
106-
I then took the data from Jeff G.'s PTM modeling of 45 nm to get the Cin (gate of first stage) and Cout
107-
(diff of second stage) listed below. Jeff's models are in tech/ptm_45nm, and are in min feature multiples.
108-
The minimum contactable transistor is 2.5 * 45 nm, so I need to multiply the drive strength sizes above by
109-
2.5x when looking up in Jeff's tables.
110-
Finally, we choose a switch delay (58 ps) that leads to length 4 wires having a delay equal to that of SIV of 126 ps.
111-
This also leads to the switch being 46% of the total wire delay, which is reasonable. -->
136+
mux transistors, and this gives a reasonable stage ratio of a bit over 5x to the second stage.
137+
-->
112138
<switch type="mux" name="0" R="551" Cin=".77e-15" Cout="4e-15" Tdel="58e-12" mux_trans_size="2.630740" buf_size="27.645901"/>
113139
<!--switch ipin_cblock resistance set to yeild for 4x minimum drive strength buffer-->
114140
<switch type="mux" name="ipin_cblock" R="2231.5" Cout="0." Cin="1.47e-15" Tdel="7.247000e-11" mux_trans_size="1.222260" buf_size="auto"/>
115141
</switchlist>
116142
<segmentlist>
117143
<!--- VB & JL: using ITRS metal stack data, 96 nm half pitch wires, which are intermediate metal width/space.
118-
With the 96 nm half pitch, such wires would take 60 um of height, vs. a 90 nm high (approximated as square) Stratix IV tile so this seems
119-
reasonable. Using a tile length of 90 nm, corresponding to the length of a Stratix IV tile if it were square. -->
144+
Wires of this pitch will fit over a 90 nm
145+
high logic tile (which is about the height of a Stratix IV logic tile).
146+
I'm using a tile length of 90 nm, corresponding to the length of a Stratix IV tile if it were square.
147+
length below is in units of logic blocks, and Rmetal and Cmetal are
148+
per logic block passed, so wire delay adapts automatically if you change the
149+
length=? value. -->
150+
151+
<!-- Currently only one type of routing wire, which
152+
is of length 4 and has switches to every connection
153+
box (4 of them) and switch box (5 of them)
154+
it passes. You can change wirelengths just by changing the length="?" values
155+
and changing the number of 1's (or 0's) in the <sb type and <cb type lines to
156+
match the number of switch blocks and connection blocks a wire of that length
157+
would span. -->
120158
<segment freq="1.000000" length="4" type="unidir" Rmetal="101" Cmetal="22.5e-15">
121159
<mux name="0"/>
122160
<sb type="pattern">1 1 1 1 1</sb>
123161
<cb type="pattern">1 1 1 1</cb>
124162
</segment>
125163
</segmentlist>
164+
<!-- Electrical and inter-cluster routing description ends -->
165+
166+
<!-- Description of the capabilities (number of BLEs, modes) and local interconnect in
167+
each type of complex (clustered) block (e.g. LBs) begins
168+
-->
126169
<complexblocklist>
127170
<!-- Define I/O pads begin -->
128-
<!-- Capacity is a unique property of I/Os, it is the maximum number of I/Os that can be placed at the same (X,Y) location on the FPGA -->
129171
<!-- Not sure of the area of an I/O (varies widely), and it's not relevant to the design of the FPGA core, so we're setting it to 0. -->
130172
<pb_type name="io">
131173
<input name="outpad" num_pins="1"/>
132174
<output name="inpad" num_pins="1"/>
133175
<clock name="clock" num_pins="1"/>
134176
<!-- IOs can operate as either inputs or outputs.
135-
Delays below come from Ian Kuon. They are small, so they should be interpreted as
136-
the delays to and from registers in the I/O (and generally I/Os are registered
137-
today and that is when you timing analyze them.
177+
The delays below are to and from registers in the I/O (and generally I/Os are registered
178+
today).
138179
-->
139180
<mode name="inpad">
140181
<pb_type name="inpad" blif_model=".input" num_pb="1">
@@ -156,25 +197,17 @@
156197
</direct>
157198
</interconnect>
158199
</mode>
159-
<!-- Every input pin is driven by 15% of the tracks in a channel, every output pin is driven by 10% of the tracks in a channel -->
160-
<!-- IOs go on the periphery of the FPGA, for consistency,
161-
make it physically equivalent on all sides so that only one definition of I/Os is needed.
162-
If I do not make a physically equivalent definition, then I need to define 4 different I/Os, one for each side of the FPGA
163-
-->
164-
<!-- Place I/Os on the sides of the FPGA -->
200+
201+
<!-- Not modeling I/O power for now -->
165202
<power method="ignore"/>
166203
</pb_type>
167204
<!-- Define I/O pads ends -->
205+
168206
<!-- Define general purpose logic block (CLB) begin -->
169-
<!--- Area calculation: Total Stratix IV tile area is about 8100 um^2, and a minimum width transistor
170-
area is 60 L^2 yields a tile area of 84375 MWTAs.
171-
Routing at W=300 is 30481 MWTAs, leaving us with a total of 53000 MWTAs for logic block area
172-
This means that only 37% of our area is in the general routing, and 63% is inside the logic
173-
block. Note that the crossbar / local interconnect is considered part of the logic block
174-
area in this analysis. That is a lower proportion of of routing area than most academics
175-
assume, but note that the total routing area really includes the crossbar, which would push
176-
routing area up significantly, we estimate into the ~70% range.
177-
-->
207+
<!-- Area below is for everything inside the
208+
logic block (LUTs, FFs, intra-cluster
209+
routing).
210+
-->
178211
<pb_type name="clb">
179212
<input name="I" num_pins="40" equivalent="full"/>
180213
<output name="O" num_pins="10" equivalent="instance"/>
@@ -198,22 +231,16 @@
198231
<input name="in" num_pins="6" port_class="lut_in"/>
199232
<output name="out" num_pins="1" port_class="lut_out"/>
200233
<!-- LUT timing using delay matrix -->
201-
<!-- These are the physical delay inputs on a Stratix IV LUT but because VPR cannot do LUT rebalancing,
202-
we instead take the average of these numbers to get more stable results
234+
<!-- These are the delay per LUT input on a Stratix IV LUT.
235+
The average is 261 ps, and inputs earlier in the mux tree are slower.
236+
-->
237+
<delay_matrix type="max" in_port="lut6.in" out_port="lut6.out">
203238
82e-12
204239
173e-12
205240
261e-12
206241
263e-12
207242
398e-12
208243
397e-12
209-
-->
210-
<delay_matrix type="max" in_port="lut6.in" out_port="lut6.out">
211-
261e-12
212-
261e-12
213-
261e-12
214-
261e-12
215-
261e-12
216-
261e-12
217244
</delay_matrix>
218245
</pb_type>
219246
<!-- Define flip-flop -->
@@ -224,6 +251,10 @@
224251
<T_setup value="66e-12" port="ff.D" clock="clk"/>
225252
<T_clock_to_Q max="124e-12" port="ff.Q" clock="clk"/>
226253
</pb_type>
254+
255+
<!-- many lines below to describe the interconnect
256+
wires, muxes and crossbars inside a cluster.
257+
-->
227258
<interconnect>
228259
<direct name="direct1" input="ble6.in" output="lut6[0:0].in"/>
229260
<direct name="direct2" input="lut6.out" output="ff.D">
@@ -262,15 +293,12 @@
262293
</complete>
263294
<complete name="clks" input="clb.clk" output="fle[9:0].clk">
264295
</complete>
265-
<!-- This way of specifying direct connection to clb outputs is important because this architecture uses automatic spreading of opins.
266-
By grouping to output pins in this fashion, if a logic block is completely filled by 6-LUTs,
267-
then the outputs those 6-LUTs take get evenly distributed across all four sides of the CLB instead of clumped on two sides (which is what happens with a more
268-
naive specification).
296+
297+
<!-- The BLE outputs are directly connected to the
298+
CLB (cluster) outputs.
269299
-->
270300
<direct name="clbouts1" input="fle[9:0].out" output="clb.O"/>
271301
</interconnect>
272-
<!-- Every input pin is driven by 15% of the tracks in a channel, every output pin is driven by 10% of the tracks in a channel -->
273-
<!-- Place this general purpose logic block in any unspecified column -->
274302
</pb_type>
275303
<!-- Define general purpose logic block (CLB) ends -->
276304
</complexblocklist>

0 commit comments

Comments
 (0)