-
Notifications
You must be signed in to change notification settings - Fork 415
Support Equivalent Placement Sites #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @kmurray. I have been thinking about this issue and how to proceed with its implementation. Considerations Currently there are three types of How to proceed First thing I would focus on, is to substitute the idea of the top level
Later on there will be the addition of the feature for which different block types can be placed in the same physical location (e.g.
|
Yep thats correct.
This makes a lot of sense to me.
Splitting this into two stages seems like a reasonable approach.
I think this is the key challenge. We'll need to think carefully how this is handled. In particular:
My initial tendency would be to keep it as simple as possible initially (it can then be generalized incrementally over time). That probably means:
This would handle the case of equivalent IO block locations (since they should all have the same pin-outs), and probably LAB/MLAB, CLBL/CLBM as well (if the pb_types are written appropriately). |
That looks like you are running into inconsistent pin equivalences between the different |
I've taken a crack writing up potential ideas for how this could be done. There are basically 3 approaches (described below) each of increasing difficulty. I'd strongly suggest starting with something like (1) which is more restrictive but simpler to implement. I'm very hesitant to even consider (3), as I think it adds significant complexity and blurs the line between packing and placement (and between pb_type and tile). Supporting it in general would break a number of significant assumptions made by packing and placement. Which would require substantial effort to fix (and it's not clear it would actually be any better). As a result there would need to be an extremely compelling motivation to make the degradation in code maintainability worthwhile.
<!-- Simple Equivalent Placement Sites (Identical pin-out & equivalence)
This approach requires that the pin-out of each grid tile *exactly*
matches the pin-out of each pb_type (including pin equivalence). This
avoids having to specify a mapping from tile to pb_type pins. (Note that
this is not that restrictive, you just make each 'equivalent' pb_type
have the same top-level pin-out. Some of those pins may be unused by
some pb_types but that is OK).
Information such as pin-locations and Fc are moved to the tile
specification (out of the top-level pb_type). Attributes like block
width/height/area are also moved to the tile.
In this formulation capacity is applied to the tile (rather than the
sites) since this keeps the direct 1:1 mapping from tile to pb_type pins.
A key advantage of this approach is that it should not require any
modifications to the RR graph. In particular, since pin outs and equivalnce
are enforced to be the same the SOURCE/SINK/IPIN/OPIN nodes match between
the tile and any equivalent pb_types. The only change required in the
RR graph generator would be to drive SOURCE/SINK/IPIN/OPIN creation off
of the tile (rather than block type) description.
-->
<tiles>
<tile name="MLAB_tile" width="1" height="1" capacity="1" area="XXX">
<input name="inputs" num_pins="50" equivalnce="full"/>
<input name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<equivalent_sites> <!-- NOTE: both pb_types required to have identical pin outs which match the tile's (this should be error checked) -->
<site pb_type="LAB"/>
<site pb_type="MLAB"/>
</equivalent_sites>
</tile>
<tile name="LAB_tile">
<input name="inputs" num_pins="50" equivalnce="full"/>
<input name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<equivalent_sites>
<site pb_type="LAB"/>
</equivalent_sites>
</tile>
<tile name="IOL_tile" capacity="2"> <!-- IO left -->
<input name="inputs" num_pins="1"/>
<input name="clk" num_pins="1"/>
<output name="outputs" num_pins="1"/>
<pinlocations pattern="custom">
<loc side="right">IOL_tile.inputs IOL_tile.clk IOL_tile.outputs</loc>
</pinlocations>
<fc ...>
<equivalent_sites>
<site pb_type="IO"/>
</equivalent_sites>
</tile>
<tile name="IOR_tile" capacity="2"> <!-- IO right -->
<input name="inputs" num_pins="1"/>
<input name="clk" num_pins="1"/>
<output name="outputs" num_pins="1"/>
<pinlocations pattern="custom">
<loc side="left">IOR_tile.inputs IOR_tile.clk IOR_tile.outputs</loc>
</pinlocations>
<fc ...>
<equivalent_sites>
<site pb_type="IO"/>
</equivalent_sites>
</tile>
</tiles>
<complexblocklist>
<pb_type name="IO"/>
<input name="inputs" num_pins="1"/>
<clock name="clk" num_pins="1"/>
<output name="outputs" num_pins="2"/>
...
</pb_type>
<pb_type name="MLAB"/>
<input name="inputs" num_pins="50" equivalent="full"/>
<clock name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
...
</pb_type>
<pb_type name="LAB"/>
<input name="inputs" num_pins="50" equivalent="full"/>
<clock name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
...
</pb_type>
</complexblocklist>
<!-- Flexible Equivalent Placement Sites (Non-identical pin-out & equivalence)
This approach extend the 'Simple Equivalent Placement Sites' approach
such that the pin-outs of each grid tile do *not* need to *exactly*
matches the pin-out of each pb_type.
In particular, each site must specify how the tile pins connect to the
associated pb_type's pins. Furthermore, pin equivalence can not be
specified on tile pins, but is controlled on the pb_types.
This formulation still only allows capacity to be specified on *tiles*,
which ensures invalid architectures can not be specified (see comments on
'Complex Equivalent Placement Sites' for details on this issue).
In addition to the additional complexity of specification, this approach
also requires changes to the RR graph. Effectively the tile pin's specify
the RR Graph's IPINs and OPINs (and how the connect to wires), while the
pb_type's specify the SOURCEs and SINKs.
We would need to build unique SOURCE/SINK nodes for each pin classes of
each site's pb_type. The <direct> specifications then become the edges
between the tiles pins (IPINs/OPINs) and the pb_types SOURCE/SINKs.
This would require updating the router to ensure it picks the courrect
SOURCE/SINK depending on which site a particular cluster was placed at.
-->
<tiles>
<tile name="MLAB_tile" width="1" height="1" capacity="1" area="XXX">
<input name="inputs" num_pins="50"/>
<input name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<equivalent_sites> <!-- NOTE: each site must specify it's pin mapping -->
<site pb_type="LAB">
<direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
<direct from="MLAB_tile.clk" to="LAB.clk"/>
<direct from="LAB.outputs" to="MLAB_tile.outpus"/>
</site>
<site pb_type="MLAB">
<direct from="MLAB_tile.inputs[19:0]" to="MLAB.addr"/> <!-- Note MLAB inputs are not equivalent -->
<direct from="MLAB_tile.inputs[29:20]" to="MLAB.data_in"/>
<direct from="MLAB_tile.clk" to="MLAB.clk"/>
<direct from="MLAB.data_out" to="MLAB_tile.outputs[9:0]"/>
</site>
</equivalent_sites>
</tile>
<tile name="LAB_tile">
<input name="inputs" num_pins="50" equivalnce="full"/>
<input name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<equivalent_sites>
<site pb_type="LAB">
<direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
<direct from="MLAB_tile.clk" to="LAB.clk"/>
<direct from="LAB.outputs" to="MLAB_tile.outpus"/>
</site>
</equivalent_sites>
</tile>
<tile name="IOL_tile" capacity="2"> <!-- IO left -->
<input name="inputs" num_pins="1"/>
<input name="clk" num_pins="1"/>
<output name="outputs" num_pins="1"/>
<pinlocations pattern="custom">
<loc side="right">IOL_tile.inputs IOL_tile.clk IOL_tile.outputs</loc>
</pinlocations>
<fc ...>
<equivalent_sites>
<site pb_type="IO">
<direct from="IOL_tile.inputs" to="IO.inputs"/>
<direct from="IOL_tile.clk" to="IO.clk"/>
<direct from="IO.outputs" to="IOL_tile.outpus"/>
</site>
</equivalent_sites>
</tile>
<tile name="IOR_tile" capacity="2"> <!-- IO right -->
<input name="inputs" num_pins="1"/>
<input name="clk" num_pins="1"/>
<output name="outputs" num_pins="1"/>
<pinlocations pattern="custom">
<loc side="left">IOR_tile.inputs IOR_tile.clk IOR_tile.outputs</loc>
</pinlocations>
<fc ...>
<equivalent_sites>
<site pb_type="IO">
<direct from="IOR_tile.inputs" to="IO.inputs"/>
<direct from="IOR_tile.clk" to="IO.clk"/>
<direct from="IO.outputs" to="IOR_tile.outpus"/>
</site>
</equivalent_sites>
</tile>
</tiles>
<complexblocklist>
<pb_type name="IO"/>
<input name="inputs" num_pins="1"/>
<clock name="clk" num_pins="1"/>
<output name="outputs" num_pins="2"/>
...
</pb_type>
<pb_type name="MLAB"/>
<input name="addr" num_pins="20"/>
<input name="data_in" num_pins="10"/>
<clock name="clk" num_pins="2"/>
<output name="data_out" num_pins="10"/>
...
</pb_type>
<pb_type name="LAB"/>
<input name="inputs" num_pins="50"/>
<clock name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
...
</pb_type>
</complexblocklist>
<!-- Complex Placement Sites (Non-identical pin-out & equivalence, internal capacity)
This approach futher generalized 'Flexible Equilvalent Placement Sites' to
allow the placer to make top-level operation mode choices on a more general
(but still restricted) set of architectures. In particular, we no longer
have a single set of equivalent mutually-exclusive *single-slot* sites,
but a set of mutually-exclusive modes which may have *multi-slot* sites.
('Flexible Equilvant Placement Sites' can also be viewed as the placer
making a selection between the mutually exclusive sites, but each of
those sites was indpendent of the others).
In this formulation capacity is applied as an attribute to the *site*
(rather the tile) to support multi-slot sites.
The key advantage of this approach is illustrated with the RAM_tile
below, which could allow allow the placer to choose either a 2xRAM18 or
1xRAM36 mode.
However there is a significant drawback to this approach. The multi-slot
specification capability makes easy to describe an architecture which will
lead to impossible-to-route placements. The comments in 'RAM_tile_invalid'
illustrate how this could be easily specified.
VPR splits placement into two stages (packing and placement) explicitly to
avoid burdening the placer with having to consider these types of detailed
constraints (which are handled by the packer). As a result the placer
assumes that any block can be placed at any 'site' of matching type,
and that the resulting placement will be free of impossible routing
bottlenecks.
-->
<tiles>
<tile name="RAM_tile">
<input name="inputs" num_pins="50"/>
<input name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<mode>
<!-- NOTE: care must be taken to ensure both sites can be used
completely independently, otherwise you end up with
dependencies between the two sites and you effectively
would need the placer to do clustering to produce a legal
solution. See 'RAM_tile_invalid' as an illustration.
-->
<site pb_type="RAM18" capacity="2">
<direct from="RAM_tile.inputs[24:0]" to="RAM18[0].inputs"/>
<direct from="RAM_tile.clk[0]" to="RAM18[0].clk"/>
<direct from="RAM18[0].outputs" to="RAM_tile.outputs[24:0]"/>
<direct from="RAM_tile.inputs[49:25]" to="RAM18[1].inputs"/>
<direct from="RAM_tile.clk[1]" to="RAM18[1].clk"/>
<direct from="RAM18[1].outputs" to="RAM_tile.outputs[49:25]"/>
</site>
</mode>
<mode>
<site pb_type="RAM36" capacity="1">
<direct from="RAM_tile.inputs" to="RAM36.inputs"/>
<direct from="RAM_tile.clk[0]" to="RAM36.clk"/>
<direct from="RAM36.outputs" to="RAM_tile.outputs"/>
</site>
</mode>
</tile>
<tile name="RAM_tile_invalid">
<input name="inputs" num_pins="50"/>
<input name="clk" num_pins="1"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<mode>
<!-- NOTE: This tile is *invalid* and could produce unroutable placements
Here, the RAM_tile has only a single clock which is
shared between the two RAM18's. As a result there is an
additional implicit constraint to the placement: only RAM18's
which share the same clock can be placed together in this tile.
Since the placer doesn't know this, it could produce a placement
which violated this constraint. The result would be a routing
failure with the single tile-level clock input pin being congested
due to a routing bottleneck (since the two RAM18's required different
clocks, but only a single pin connects them).
We would need to detect and explictly reject the specification
of such architectures, to avoid this. Or generalize the idea of
placement macros to enforce this constaint - but it is far from
clear whether that is actually a good idea or not!
-->
<site pb_type="RAM18" capacity="2">
<direct from="RAM_tile.inputs[24:0]" to="RAM18[0].inputs"/>
<direct from="RAM_tile.clk[0]" to="RAM18[0].clk"/>
<direct from="RAM18[0].outputs" to="RAM_tile.outputs[24:0]"/>
<direct from="RAM_tile.inputs[49:25]" to="RAM18[1].inputs"/>
<direct from="RAM_tile.clk[0]" to="RAM18[1].clk"/>
<direct from="RAM18[1].outputs" to="RAM_tile.outputs[49:25]"/>
</site>
</mode>
<mode>
<site pb_type="RAM36" capacity="1">
<direct from="RAM_tile.inputs" to="RAM36.inputs"/>
<direct from="RAM_tile.clk[0]" to="RAM36.clk"/>
<direct from="RAM36.outputs" to="RAM_tile.outputs"/>
</site>
</mode>
</tile>
<tile name="MLAB_tile" width="1" height="1" area="XXX"> <!-- Note: capacity no longer a tile attribute -->
<input name="inputs" num_pins="50"/>
<input name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<mode>
<site pb_type="LAB">
<direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
<direct from="MLAB_tile.clk" to="LAB.clk"/>
<direct from="LAB.outputs" to="MLAB_tile.outpus"/>
</site>
</mode>
<mode>
<site pb_type="MLAB">
<direct from="MLAB_tile.inputs[19:0]" to="MLAB.addr"/> <!-- Note MLAB inputs are not equivalent -->
<direct from="MLAB_tile.inputs[29:20]" to="MLAB.data_in"/>
<direct from="MLAB_tile.clk" to="MLAB.clk"/>
<direct from="MLAB.data_out" to="MLAB_tile.outputs[9:0]"/>
</site>
</mode>
</equivalent_sites>
</tile>
<tile name="LAB_tile">
<input name="inputs" num_pins="50" equivalnce="full"/>
<input name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<mode>
<site pb_type="LAB">
<direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
<direct from="MLAB_tile.clk" to="LAB.clk"/>
<direct from="LAB.outputs" to="MLAB_tile.outputs"/>
</site>
</mode>
</tile>
<tile name="IOL_tile" capacity="2"> <!-- IO left -->
<input name="inputs" num_pins="1"/>
<input name="clk" num_pins="1"/>
<output name="outputs" num_pins="1"/>
<pinlocations pattern="custom">
<loc side="right">IOL_tile.inputs IOL_tile.clk IOL_tile.outputs</loc>
</pinlocations>
<fc ...>
<mode>
<site pb_type="IO">
<direct from="IOL_tile.inputs" to="IO.inputs"/>
<direct from="IOL_tile.clk" to="IO.clk"/>
<direct from="IO.outputs" to="IOL_tile.outpus"/>
</site>
</mode>
</tile>
<tile name="IOR_tile" capacity="2"> <!-- IO right -->
<input name="inputs" num_pins="1"/>
<input name="clk" num_pins="1"/>
<output name="outputs" num_pins="1"/>
<pinlocations pattern="custom">
<loc side="left">IOR_tile.inputs IOR_tile.clk IOR_tile.outputs</loc>
</pinlocations>
<fc ...>
<mode>
<site pb_type="IO">
<direct from="IOR_tile.inputs" to="IO.inputs"/>
<direct from="IOR_tile.clk" to="IO.clk"/>
<direct from="IO.outputs" to="IOR_tile.outpus"/>
</site>
</mode>
</tile>
</tiles>
<complexblocklist>
<pb_type name="IO"/>
<input name="inputs" num_pins="1"/>
<clock name="clk" num_pins="1"/>
<output name="outputs" num_pins="2"/>
...
</pb_type>
<pb_type name="MLAB"/>
<input name="addr" num_pins="20"/>
<input name="data_in" num_pins="10"/>
<clock name="clk" num_pins="2"/>
<output name="data_out" num_pins="10"/>
...
</pb_type>
<pb_type name="LAB"/>
<input name="inputs" num_pins="50"/>
<clock name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
...
</pb_type>
<pb_type name="RAM18"/>
<input name="inputs" num_pins="25"/>
<clock name="clk" num_pins="1"/>
<output name="outputs" num_pins="25"/>
...
</pb_type>
<pb_type name="RAM36"/>
<input name="inputs" num_pins="50"/>
<clock name="clk" num_pins="1"/>
<output name="outputs" num_pins="50"/>
...
</pb_type>
</complexblocklist> |
An additional wrinkle which actually effects all 3 of the above approaches relates to differing Fc specifications at different tiles/sites. The packer currently assumes that it can exit and re-enter the cluster through the general routing. (although there are exceptions for Fc=0 pins, e.g. cin/cout of adder chains). Since this approach would decouple the Fc specification from the pb_type there are some corner cases to consider. For example, what the packer assumes a path through the general routing, which is true of some sites (Fc > 0) but not others (Fc = 0 on the same pin)? Potential ways to handle this could be:
|
@kmurray Thanks for the very detailed explanation. I have started a WIP implementation that looks similar to the first approach, which implies a strict equivalence between the tile and the pb_type pins. It differs in the way the XML is defined and consequently read in VPR, but the underlying VPR changes should not change between my first implementation and the I need to think about the Fc problem you have referred to and see how this can be effectively implemented. As far as I understood the issue is that different tiles, even if they may be considered equivalent ( |
In our case, I don't believe Fc matters because the interconnect <-> pin connections are constant. Even considering Fc, in our application Fc is constant between the sites. Can you think of the counter example where Fc might vary between sites? In a uniform interconnect, equivalent sites should have equivalent Fc. |
I also agree that approach 1 (simplest) is best. @acomodi For the Fc (corner case, different Fc for different equivalent tiles) I would just iterate through and use the smallest Fc of the equivalent tiles in the code in the packer that checks if a pin can reach general routing. The common case is all the Fc's are the same anyway, and this handles the corner case of them being different in a simple way, with non-CPU-critical code. |
@kmurray Here there is a first implementation of the equivalent tiles: #559. I have done it in a slightly different way, which I think can be still easily changed following into your suggestion. As it is right now the XML description looks like this: <tiles>
<tile name="CLBM" capacity="1" width="...">
<fc ...>
<pinlocation ...>
</tile>
<tile name="CLBL">
<equivalent_tiles>
<mode="CLBM">
<direct from="CLBL.A" to="CLBM.A" num_pins="1">
<direct from="CLBL.D" to="CLBM.D" num_pins="1">
<direct from="CLBL.DX" to="CLBM.DX1" num_pins="1">
...
</mode>
</equivalent_tiles>
<fc ...>
<pinlocation ...>
</tile>
<tile>
<mode ... />
...
</tile>
...
</tiles>
<complexblocklist>
<pb_type name="CLBM">
<inputs />
<outputs />
<pb_type>
...
</pb_type>
<interconnect/>
</pb_type>
...
</complexblocklist>
... Inputs and outputs are already defined in the pb_type, I don't think it is necessary to specify them also in the tile. What I have done is actually to invert the idea: equivalent tiles are specified only for those tiles which have equivalent ones (e.g. CLBL has CLBM) instead of specifying a I have checked PR #559 with the SymbiFlow tests and VPR produced valid designs. Another thing to consider is that I did not yet take into account the |
@kmurray, Probably the first thing I could do to make my approach more similar to yours is to move all the tile equivalence information to the struct t_logical_type {
t_pb_type *pb_type;
std::container<t_physical_type> equivalent_physical_type;
std::map_container<t_physical_type, std::map_container<int, int>> pin_mapping_with_equivalent_physical_types;
std::map_container<t_physical_type, std::map_container<int, int>> inverse_pin_mapping_with_equivalent_physical_types;
// Something else I forgot about
}
Then the I do not yet foresee how the split of the Does this make sense? |
@kmurray I have been working on supporting equivalent sites after #941. For instance, there are more LAB instanced then the physical locations, but LAB could be potentially placed in LABM locations, so VPR should not exit with a failure. My question is: what could be the best approach to take this issue into account after the packing stage and before the placement one? |
@acomodi: I'm not sure of the specific code failure you're seeing, but here are a few thoughts. |
I believe this is now complete with #988. |
Proposed Behaviour
It is possible that a single logical block type could be mapped to multiple potential physical grid tiles in the FPGA.
Examples of this include
In general it should be possible to list a set of equivalent placement locations/sites for each block type.
Current Behaviour
VPR assumes that each top-level
pb_type
can only be placed at a placement location of exactly the same type.Possible Solution
We should probably decouple the packing decisions (i.e. logical block types exist and can be created), from the placer's decisions (what grid tiles the logical block types can be placed in).
Currently there is no distinction.
Context
This will help improve the generality of the flow.
It will also fix non-intuitive/confusing behaviour such as #268, #512, #349.
The text was updated successfully, but these errors were encountered: