-
Notifications
You must be signed in to change notification settings - Fork 415
Equivalent Tiles placement #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @kmurray @vaughnbetz, I have added a python script to modify all the test architecture to move the top level-only pb_type tags/attribute to the new |
@acomodi There is actually already a script we use for upgrading architecture files. Can you add your steps to upgrade as another pass there? |
@kmurray Sure, I can add the procedure in there. Should I also run the script and push all the modified architectures as well? What they do is generating the architecture with the |
Hi @kmurray, I have run two test designs with and without Equivalent Placement (EP). Without EP (SymbiFlow@fa93ec2):
With EP (antmicro@635a591):
A note to be said is that there was timing placement has not been used, only bounding box has been adopted. |
That certainly looks promising. |
@vaughnbetz - Just FYI, I created two issues about getting the two benchmarks that @acomodi mentioned into the normal VtR benchmarks. You can find them at #584 and #582 . |
Hi @vaughnbetz @kmurray,
Placement time is increased by ~3% even without equivalent tiles. I think it can be easily taken down to be even with master. For what regards routing I am unsure why there is an increase in the time. I need to investigate whether my changes do affect placement even if there are no equivalent tiles. If you could provide some initial feedback with a review it would be helpful to see if this PR is going in the right direction. Comparison file: comparison-place-vtr-qor.xlsx |
For the QoR results: Placement time is up a bit as you noted (+3.8%). Would be good to get back if possible. The route time does look to be up consistently. I'm not sure why -- I also wouldn't have expected this change to affect route time. Given the quality (critical path delay, routed wirelength) matches, it isn't that the router is facing a more difficult problem -- it is just taking longer. The router doesn't look at the grid very much so hopefully profiling the code would show where the new hot spot is. I'd also run the Titan benchmarks to see if they show the same behaviour. I haven't looked at the code yet and won't have time to for a few days anyway. |
Hi @vaughnbetz, |
945cf51
to
f5ef6c4
Compare
@vaughnbetz, I have ran the TItan benchmarks and got the results. I will paste also the parsed results as there were some failures on 2 tests:
Both of them seemed to be failing during placement. I need to check if my changes are somehow affecting non-equivalent tiles, but the fact that LU230 passed with my changes and failed on baseline is curious. I need to investigate the reasons behind this. Comparison file: comparison-titan.xlsx parse_results Run-times (excluding the two failed experiments) seems to be even between changes and baseline |
The QoR results look good, except for the two failures which are both placement failures. |
QoR looks good; thanks.
|
f5ef6c4
to
e1fbaef
Compare
@kmurray Ping |
bbc3140
to
2c4fb1e
Compare
@acomodi Can you update this PR to be in sync with master? I'll try to give a full review soon. |
This commit introduces two major features: - introduce the tile concept in the architecture XML. Top level pb_type is discarded and all the top level pb_type information are moved into the tile tags. This will cause CI build to fail as all the architectures do not include the tiles tag right now. - introduce the possibility to place blocks in equivalent tiles (SLICEL blocks into SLICEM ones). According to the XML architecture description there could be tiles equivalent to others that can be used during the placement step (this can bring to better placement solutions) Signed-off-by: Alessandro Comodi <[email protected]>
I have also changes travis.yml to install the lxml python package needed by the script Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
2c4fb1e
to
2daf153
Compare
@kmurray Done, CI is currently failing as I removed my changes in the |
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
c40cb50
to
4f59266
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @acomodi for putting this together. I think this is an important feature to get into VPR (its been requested by a variety of users over the years).
I've included detailed inline comments below. However I think there are some high-level issues which will need to be reworked before this can be merged, which I outline below. The main one is the distinction between the logical and physical type of a clustered block.
If you have thoughts/comments/questions we can discuss them to come up with a plan to move forward on this.
Logical vs Physical Types
I think we need to make a much clearer distinction between the 'logical block type', and the 'physical tile type'. At the moment they are both combined in t_type_ptr
which I think is quite confusing and difficult to understand, particularly now that this PR defines equivalencies between them.
This is somewhat a historical artifact of how this code has evolved. Before this PR there was no distinction between these (logical block type and physical tile were the same). However now that we are making them non-identical I think it is important to distinguish between them explicitly.
More concretely, we need to split out the physical tile information into a separate data structure:
- Pin-out to RR graph/Pin locations/Fc/switchblock locations
- width/height/capacity/area
We should probably call this something liket_physical_tile
.
That would leave t_type_ptr
to represent the 'logical' type of a block produced by the packer, containing:
- pb_type
- pb_graph_head
The current approach of keeping everything in the t_type_ptr
and making some t_type_ptr
s 'equivalent' to others is confusing, and results in some complexity when asking what the 'type' of a block in the netlist is. I think separating them out makes the dependencies between placement locations and logical types much clearer.
Location of equivalent type state
Currently in this PR, which tile type is physically implementing a particular block is stored in the ClusteredNetlist. The general goal of the ClusteredNetlist is to represent the logical state of the clustered netlist, and not to store information about placement decisions (i.e. which tile type is implementing a particular logical block). This is related to the logical/physical distinction discussed above.
If we split out the logical/physical types into separate data structures (as discussed above), I think we should keep clb_nlist.block_type() returning the logical block type of a block in the netlist.
The information about what physical tile type is currently used to implement that block should be moved to the placement context.
XML Format
As discussed here we probably want to re-use <direct>
to describe the pin mappings between tiles and types. Currently you use <map>
, which is functionally the same as <direct>
but with a different name. I think we should just use <direct>
to reduce the cognitive load on end users and re-use that existing concept.
Also (as discussed in the comments), I think using <site>
instead of <mode>
more clearly communicates what is being modelled.
|
||
""" | ||
This script is intended to modify the architecture description file to be compliant with | ||
the new format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually already a script for upgrading architecture files (vtr_flow/script/upgrade_arch.py). This should be moved into that script (as another pass).
arch_name=$(basename ${arches%.*}) | ||
|
||
TEST_FULL_REF="${bench_name}/${circuit_name}/${arch_name}" | ||
DIR="${NEW_RUN_DIR}/${TEST_FULL_REF}" | ||
mkdir -p $DIR | ||
|
||
arch_cmd="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted below, we should put this upgrade step in upgrade_arch.py and then upgrade all of the architecture files in the VTR source tree. That should avoid the need to do this type of on-the-fly architecture file modification (which would appear rather mysterious to end users).
int num_equivalent_tiles = 0; | ||
std::unordered_map<int, t_type_descriptor*> equivalent_tiles; /* [0..num_equivalent_tiles-1] */ | ||
std::unordered_map<int, std::unordered_map<int, int>> equivalent_tile_pin_mapping; /* [0..num_equivalent_tiles-1][0..num_pins-1] */ | ||
std::unordered_map<int, std::unordered_map<int, int>> equivalent_tile_inverse_pin_mapping; /* [0..num_equivalent_tiles-1][0..num_pins-1] */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping equivalent bi-directional mappings like this (i.e. equivalent_tile_pin_mapping
and equivalent_tile_inverse_pin_mapping
) in sync requires special care. A common source of future bug is if the mapping is updated in one direction but not the other.
We should consider using vtr::bimap for this purpose as it provides an interface which ensures the forward and inverse mappings are always consistent (you can see its usage for this purpose here for atom related lookups).
@@ -595,6 +609,13 @@ struct t_type_descriptor /* TODO rename this. maybe physical type descriptor or | |||
t_pb_type* pb_type = nullptr; | |||
t_pb_graph_node* pb_graph_head = nullptr; | |||
|
|||
/* Equivalent tiles information */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial intuition is that the equivalency information should really be part of a different data structure. That is we should separate out the physical 'tile' and logical block 'type' parts.
For instance a 'tile' would specify:
- A set of pins
- Fc
- Capacity
- height/width
- area
A block type woud specify:
- The top-level pb_type/pb_graph_head
/* Returns the indices of pins that contain a clock for this physical logic block */ | ||
std::vector<int> get_clock_pins_indices() const; | ||
|
||
/* Returns a boolean set to True if the input index belongs to an available tile, False otherwise */ | ||
bool is_available_tile_index(int index_to_check) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_available
is a confusing name (e.g. available in current placement? legal? not clear). I think what you probably mean is is_valid
or is_legal
.
</tile> | ||
<tile name="BLK_IG-SLICEL"> | ||
<equivalent_tiles> | ||
<mode name="BLK_IG-SLICEM"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would much prefer the naming and structure suggested in #513:
<tile name="MLAB_tile" width="1" height="1" capacity="1" area="XXX">
<input name="inputs" num_pins="50"/>
<input name="clk" num_pins="2"/>
<output name="outputs" num_pins="50"/>
<pinlocations ...>
<fc ...>
<equivalent_sites> <!-- NOTE: each site must specify it's pin mapping -->
<site pb_type="LAB">
<direct from="MLAB_tile.inputs" to="LAB.inputs"/> <!-- Note LAB inputs are equivalent -->
<direct from="MLAB_tile.clk" to="LAB.clk"/>
<direct from="LAB.outputs" to="MLAB_tile.outpus"/>
</site>
<site pb_type="MLAB">
<direct from="MLAB_tile.inputs[19:0]" to="MLAB.addr"/> <!-- Note MLAB inputs are not equivalent -->
<direct from="MLAB_tile.inputs[29:20]" to="MLAB.data_in"/>
<direct from="MLAB_tile.clk" to="MLAB.clk"/>
<direct from="MLAB.data_out" to="MLAB_tile.outputs[9:0]"/>
</site>
</equivalent_sites>
</tile>
The main points are as follows:
-
Explicit pin listing
By explicitly listing the set of pins we explicitly define the external routing interface of the tile (this aligns with what I discussed earlier about making the logical block type and physical tile types distinct, otherwise calling things 'equivalent when they really may not be is confusing). It also makes it straight forward to specify a superset of pins which may not exist on any particular pb_type. -
Use
site
instead ofmode
.
Mode has a very specific meaning within the<pb_type>
hierarchy that I don't think we want to bring into<tiles>
(e.g. no hierarchical modes). I think 'site' captures how we are modelling this (i.e. a location the placer can put something), rather than a particular mode (which as packing connotations). While there may not always be a perfect distinction between packing/placement decisions in the actual implementation, it is a distinction we make in the code, so I think it makes sense to have the reflected here. -
Use
direct
instead ofmap
Functionally your<map>
tags are acting as<direct>
s. To avoid overloading the user writing these files with yet another concept its probably better to re-use the<direct>
concept and syntax. However we would only want to support<direct>
(not<complete>
or<mux>
).
@@ -430,7 +430,10 @@ | |||
#system "cp $odin2_base_config" | |||
|
|||
my $architecture_file_path_new = "$temp_dir$architecture_file_name"; | |||
copy( $architecture_file_path, $architecture_file_path_new ); | |||
my $ret = `$vtr_flow_path/scripts/add_tiles.py --arch_xml $architecture_file_path > $architecture_file_path_new`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere this kind of on the fly architecture conversion will appear quite mysterious to end users.
We should put the architecture file upgrade code in upgrade_arch.py and then upgrade all the checked-in architectures.
|
||
# Script parameters | ||
#script_params="" | ||
script_params = -track_memory_usage -lut_size 1 -starting_stage vpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for using -lut_size 1
?
Type->height = get_attribute(Node, "height", loc_data, OPTIONAL).as_uint(1); | ||
Type->area = get_attribute(Node, "area", loc_data, OPTIONAL).as_float(UNDEFINED); | ||
|
||
if (atof(Prop) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be Type->area < 0
?
@@ -2638,7 +2656,7 @@ static void ProcessComplexBlocks(pugi::xml_node Node, | |||
/* Alloc the type list. Need one additional t_type_desctiptors: | |||
* 1: empty psuedo-type | |||
*/ | |||
*NumTypes = count_children(Node, "pb_type", loc_data) + 1; | |||
*NumTypes = count_children(Node, "tile", loc_data) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably also want a call to expect_children
since only 'tile' tags are expected
Thanks for the detailed feedback @kmurray. To do so, I believe that the first thing to do is to forget for a while about equivalent placement, and start digging into the code and well-separate the two entities ( I think that this could be done by initially creating an identical structure to Once this first step is achieved, the equivalence of the tiles becomes the new focus and will be applied to the physical tiles only. I am going to first tackle your inline review comments and then proceed with the physical/logical separation. Do you think this is the right way to go? |
I agree focusing on this first is the right approach, and your proposed approach of splitting the types incrementally seems reasonable. I'd probably use the terms 'physical tile', and 'logical block' to make the distinctions even clearer and add 'type' to both to clarify that they reflect types of things rather than individual tile/block instances (e.g.
I'll leave this up to you judgment. I think the re-achitecting of the code will obsolete some of the comments so your call whether to fix them first or after. Others could be addressed either now or later. |
@kmurray I have a question on the RR graph Pin-out: Where and how is this information processed? |
@acomodi Thats a good question, and gets to the heart of the logical vs physical discussion. Current During RR graph construction, pin class information (t_type_descriptor::class_inf) is used to construct the appropriate SOURCE/SINK RR nodes. Which are then hooked up to IPIN/OPIN RR nodes according to their location. See for instance the creation of connections between SOURCE/SINK and IPIN/OPINs here. Future
In terms of how best to do that, here are my thoughts:
With that RR graph structure we can build a single RR graph and control whether the router is routing to one site (e.g LAB) or the other (e.g. MLAB) simply by changing the net's source/sink RR node targets, depending on which site was picked during placement (i.e. by changing device_ctx.net_rr_terminals). |
@kmurray Thanks for the comment. I think I understood the strategy that we are going to adopt, I still need to figure out how to best implement it practically. At the moment I am still struggling with the first step of splitting One more question is: does it sound good to you to add the |
Thanks for following up @acomodi. I think it makes the most sense to tackle this incrementally. 1. Functionality preserving refactoring (no new capabilities) struct t_logical_block_type {
//Like t_type_descriptor but without pinlocation etc info
//...
//Index into device_ctx.tile_types (one-to-one mapping for now)
int physical_tile_type_index;
};
struct t_physical_tile_type {
//Contains the 'interface' data like pinlocations etc. from t_type_descriptor
//...
//Index into device_ctx.block_types (one-to-one mapping for now)
int logical_block_type_index
};
struct DeviceContext {
std::vector<t_logical_block_type> block_types;
std::vector<t_physical_tile_type> tile_types;
}; It is then straightforward to look-up the associated tile/block types: //Initialize physical_tile_a and block_type_b somehow (e.g. from netlist, or from grid)
t_physical_tile_type* tile_type_a = device_ctx.grid[x][y].type
t_logical_block_type* block_type_b = clb_nlist.block_type(blk_b);
//The logical block type of tile_type_a
t_logical_block_type* block_type_a = &device_ctx.block_types[tile_type_a.logical_block_type_index];
//The physical tile type of block_type_b
t_physical_tile_type* tile_type_b = &device_ctx.tile_types[block_type_b.physical_tile_type_index]; Of course it would probably be better to hide the details of this data structure tracing in subroutines (to avoid scattering it all through the code-base): t_logical_block_type* block_type_a = logical_block_type(tile_type_a);
t_physical_tile_type* tile_type_b = physical_tile_type(block_type_b); Since a common case would be going from a t_physical_tile_type* physical_tile_type(ClusterBlockId blk) {
t_logical_block_type* blk_type = cluster_ctx.clb_nlist.block_type(blk);
return physical_tile_type(blk_type);
} At this point we would have seperated the logical/physical information, but from an external perspective the behaviour of VPR should be identical (no changes to architecture format, algorithms, result quality etc.). 2. Extension to equivalent placement sites t_physical_tile_type* physical_tile_type(ClusterBlockId blk) {
//Use the placement_ctx to find out in which physical tile type
//blk has been placed, and return its type.
auto loc = place_ctx.block_loc[blk];
return place_ctx.grid[loc.x][loc.y].type;
} |
@kmurray Sounds very good to me. IMHO, even being the two changes (equivalent placement and physical/logical separation) highly related, it would be better to separate them as the first one does not depend on the equivalent tiles placement, while the opposite is not true. |
Yep that makes sense to me. It'll help keep things more contained and also make it easier to review. |
Closing this PR as it is superseded by #988 |
This PR introduces two main changes:
<complexblocklist>
of two types: normal and primitive.<tile>
tag in the XML specifies possible equivalent tiles, these can be used during placement.These changes still need refinement and a review stage to check the direction of this work.
Description
Tile tag
The change related to the tile tags is needed to clearly separate the concept of pb_type and top level pb_type. A top level pb_type, in fact, is a physical configurable block of the FPGA that has to be treated differently from the lower level pb_types.
In order to do so, a new tag has been added to the XML, namely the
tiles
tag. It contains all the information relative to theex-top level
pb_types (e.g. fc, pinlocations, switchblock_locations).This change is addressed in the
read_xml_arch_file.cpp
andphysical_types.h
.The current PR is going to fail on CI as the current test architectures do not include the
<tiles>
tag.Equivalent Tile placement
The second change addresses the Equivalent Tiles placement.
What happens is that the placement stage now does not rely strictly on the clusterer decisions: each block belongs to a
top level
pb_type (now called justtile
), which can be swapped with a corresponding equivalent during placement (if equivalent tiles are present).There are checks that do ensure that, during the swap stage, blocks are placed only in correct locations (same tiles, or equivalent tiles).
When the placement has finished, the clustering information need to change, as it could happen that a CLB (Clustered Block) that refers to a specific type (e.g. SLICEL), is placed in an equivalent tile (e.g. SLICEM). This situation generates conflicts during the routing stage, as, even being equivalent, the two tiles have different pin configurations.
To address this issue, the clustering context has to be changed with the information about the new pin mappings for the blocks that have been swapped in equivalent tiles. This happens in the background, therefore there was no need to apply changes at routing stage.
Related Issue
The issue that is addressed with this PR is: #513
Motivation and Context
The motivation of moving the top level pb_type definition to another location in the architecture definition is to have a clearer separation on the two concepts of top level pb_type and normal/primitive pb_type.
Instead, the Equivalent Tiles feature is an improvement in VPR. With the ability to use equivalent locations, placer can choose for a better solution when trying to find the best placement, reducing its final cost.
Moreover, this solution can overcome the fact that the packer is not aware of the possibility to use an equivalent tile (and it should correctly not be aware of that) and the job is left to the placer. In fact, the packer may always choose to implement a cluster using a certain tile instead of another, even though there are other tiles that could be used.
In this case, there could be designs which would consume more resources (tiles) than the available ones and the placer, if allowed, could place the exceeding blocks in the equivalent locations.
How Has This Been Tested?
There is a very initial implementation of a test which was needed during development. It should be made more robust.
Types of changes
Checklist: