-
Notifications
You must be signed in to change notification settings - Fork 415
Add Heterogeneous Tiles capability #1208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Heterogeneous Tiles capability #1208
Conversation
@litghost @kmurray @vaughnbetz FYI. |
Thanks @acomodi for the diagrams. They are helpful for grasping the issue, although I still don't think I fully comprehend the current limitations. It seems to me like these options should simply be encoded as options within the pb_type at that X/Y location. For example, in case 1, can you comment on why putting the 16 bufg intances within a single pb_type and using custom pin locations is insufficient? <pb_type name="bufg_16">
<input name="bufg1_in"/>
<input name="bufg2_in"/>
....
<input name="bufg16_in"/>
<!--similarly for outputs-->
<pb_type name="bufg" num_pb="16">
...
</pb_type>
....
</pb_type>
<tile name="bufg_tile" width="1" height="4">
...
<pinlocations pattern="custom">
<loc side="left" y_offset="0"> bufg_16.bufg1_in bufg_16.bufg2_in bufg_16.bufg3_in bufg_16.bufg4_in</loc>
<loc side="left" y_offset="1"> bufg_16.bufg5_in bufg_16.bufg6_in bufg_16.bufg7_in bufg_16.bufg8_in</loc>
...
<loc side="left" y_offset="3"> bufg_16.bufg13_in bufg_16.bufg14_in bufg_16.bufg15_in bufg_16.bufg16_in</loc>
</pinlocations>
</tile> |
@kmurray Sure, probably I should have explained a bit more on the BUFG issue. The fact is that we need to have the freedom to choose where a certain BUFG instance is going to be placed. This because the 7series for instance require special treatment of clock resources (e.g. BUFG and PLL must be in the same clock region). |
This is because that defines one cluster, rather than sixteen clusters. In the 7-series arch, there are two BUFGCTRL tiles, a top and a bottom. Some BUFG's must be in the top or bottom, some BUFG can be in either. In order to allow for the placer (or fixed placement constraints) to work, each BUFGCTRL must be in its own cluster, so they can be placed independently. |
OK, so the desire is to support different pb_types (which may possibly be equivalent with other pb_types) at each 'capacity' location within a specific tile (i.e. physical routing interface). Is that correct? |
Yes. |
Discussion March 12:
|
@kmurray @vaughnbetz Ping |
582300b
to
1cbd2df
Compare
@litghost @kmurray @vaughnbetz FYI, I have started implementing the Heterogeneous tiles following the document's proposal. DONE:
TODO:
This still does not require a review as it is a WIP still, I pushed this draft in case you want to have a look on the progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a couple comments below.
@@ -582,21 +583,15 @@ struct t_physical_tile_type { | |||
int width = 0; | |||
int height = 0; | |||
|
|||
bool**** pinloc = nullptr; /* [0..width-1][0..height-1][0..3][0..num_pins-1] */ | |||
std::vector<std::vector<std::vector<std::vector<bool>>>> pinloc; /* [0..width-1][0..height-1][0..3][0..num_pins-1] */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might consider using vtr::Matrix<std::vector> here (since the block width/height should be known when building the tile).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I have actually used an NdMatrix
of dimension 3 as also the number of sides is known.
@@ -611,6 +606,8 @@ struct t_physical_tile_type { | |||
|
|||
int index = -1; /* index of type descriptor in array (allows for index referencing) */ | |||
|
|||
std::vector<t_sub_tile> sub_tiles; | |||
|
|||
std::vector<t_logical_block_type_ptr> equivalent_sites; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably on the TODO list, but it would seem that this should be removed since it's now in the sub-tiles.
vpr/src/util/vpr_utils.cpp
Outdated
int sub_tile_index = 0; | ||
auto logical_pin = type->tile_block_pin_directs_map(physical_pin(pin_index)); | ||
sub_tile_index = logical_pin.sub_tile_index; | ||
pin_index = logical_pin.pin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code shows up in a couple places. We should probably put it into a re-usable function which does this lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this piece of code actually won't work. The directs_map
needs a logical block, which is not provided.
The problem here is to calculate the correct sub_tile based on a physical pin index. I think an additional member to the sub_tile defining the pin ranges so to easily calculate in which sub_tile the pin_index is located (and also change the math needed to get the pin index in the capacity instance.
324d7fa
to
681c32f
Compare
@kmurray I should have solved all the strong and basic regression tests issues (they do pass on my local machine). I believe this PR is now ready for a first review pass. What is still needed though is correctly picking a sub tile coordinate that is compatible with the logic block type during placement. |
ecd6737
to
e90b2af
Compare
@kmurray I believe all the issues have been addressed and solved. I have added a strong regression test as well to verify the sub tiles correctness. Currently what is left to do to have a complete PR is to add the relative documentation for the sub tiles usage (maybe even a tutorial similar to the |
5ace6f1
to
7a7333c
Compare
@kmurray I have also temporarily reverted the change on the architecture XMLs so to make code review easier. I will remove the revert commit once all the future review comments have been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good. I've included various comments below.
I agree adding a tutorial to the documentation would be good.
I think we'll also need to evaluate this for QoR/Run-time (e.g. on VTR and Titan benchmarks) to see if there are any performance issues (I noted a potential one in the comments below).
vpr/src/place/move_utils.cpp
Outdated
//Each x/y location possibly contains multiple sub tiles, so we need to pick | ||
//a z location within a compatible sub tile. | ||
std::vector<int> compatible_sub_tiles; | ||
for (int capacity = 0; capacity < to_type->capacity; capacity++) { | ||
t_pl_loc loc(to.x, to.y, capacity); | ||
|
||
if (is_tile_compatible(to_type, type, loc)) { | ||
compatible_sub_tiles.push_back(capacity); | ||
} | ||
} | ||
to.sub_tile = compatible_sub_tiles[vtr::irand((int)compatible_sub_tiles.size() - 1)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is allocating a a std::vector deep in the placer's inner loop, which is likely to be a significant performance issue.
Probably we'll need to pull this out and pre-determine what sub-tile locations are compatible with each logical type. I expect this will end-up in the compressed grid data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested I have moved the creation of the vector during create_compressed_block_grids.
To do so an unordered map is necessary, as it may be possible that a logical block is can be placed in different tiles with different capacities, hence, the sub tile index depends not only on the logical block, but also on the physical tile chosen for the swap.
This should reduce the impact on run time, and I will run the qor benchmarks to verify this.
If the performance hit is too much, we could just let the move generator produce a random sub tile index which is checked at a later stage out of the inner loop.
vpr/src/util/vpr_utils.h
Outdated
@@ -146,6 +148,8 @@ int get_max_num_pins(t_logical_block_type_ptr logical_block); | |||
|
|||
bool is_tile_compatible(t_physical_tile_type_ptr physical_tile, t_logical_block_type_ptr logical_block); | |||
|
|||
bool is_tile_compatible(t_physical_tile_type_ptr physical_tile, t_logical_block_type_ptr logical_block, t_pl_loc loc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the t_pl_loc is needed to see which capacity location the logical block is in.
If that is the case, we probably don't need the physical_tile as well (as it is also determined by the loc).
It may make sense to factor this out so there is a:
bool is_subtile_compatible(t_sub_tile sub_tile, t_logical_block_type_ptr logical_block);
which is_tile_compatible() calls internally.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was actually thought to be used in cases where the placement is not yet completed, to check whether specific locations were compatible (e.g. read_place.cpp, move_utils.cpp).
It works the other way around, meaning that, first, there is a check to test whether a specific sub tile index is compatible within a physical tile, and then, the logic block itself is checked as well, to see if that is compatible with the physical tile (by calling the original is_tile_compatible
function.
vpr/src/util/vpr_utils.cpp
Outdated
int get_physical_pin(int sub_tile_index, | ||
t_physical_tile_type_ptr physical_tile, | ||
t_logical_block_type_ptr logical_block, | ||
int pin) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between these two versions of get_physical_pins()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, one is the superset of the other.
I have modified as follows;
- The one with the sub_tile_index parameter is the one that performs the lookup to get the physical pin.
- The other homonym function, as first thing gets the sub tile index based on which sub tile is compatible with the logical block, then it calls the
get_physical_pin
with the sub_tile_index parameter.
This was done mainly because now, in the logical pin <-> physical pin
mapping there is the sub tile concept introduced. The logical pin now has a sub tile index.
During clustering there is the need to access the physical pin related to a logical one, but it is not important which exact sub tile is needed, as the physical pin is needed only to check the type of pin for instance.
The only important place where the sub tile index needs to be the correct one, is during the block sync at the end of placement, where the "real" physical pins are mapped against the logical pins.
vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_sub_tiles/config/config.txt
Show resolved
Hide resolved
cf0d77b
to
9d5d4c9
Compare
There still are two regression tests which are not passing Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
This commit also increases the netlist size of the sub tiles test Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
0df965d
to
763528e
Compare
Thanks @acomodi for putting this together! This should be a great enhancement to the placer and our modelling capabilities. |
Description
This PR is an initial development towards the solution to have heterogeneous tiles capability.
There are currently two types of situations that VPR cannot handle:
z
location in which the tile is placed. As a concrete example, the BUFG tile in the Series7 comprehends 16 BUFGCTRL sites, each with the same I/Os and functionality. They could be expressed with a single tile, with capacity of 16. The issue though is that the pin_locations of some of the BUFGCTRL is not equal to the others (as visible in the image below).This PR is an initial step towards the solution of this issue.
The final solution would ideally solve both of the above mentioned situations, as we could treat the BUFGCTRL sites not homogeneously as it happens now, but as an heterogeneous set, composed, with each element of the set having different pin_locations
Related Issue
This PR is to resume the discussion started here: #1063
Motivation and Context
How Has This Been Tested?
Types of changes
Checklist: