Skip to content

Add Heterogeneous Tiles capability #1208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Apr 13, 2020

Conversation

acomodi
Copy link
Collaborator

@acomodi acomodi commented Mar 11, 2020

Description

This PR is an initial development towards the solution to have heterogeneous tiles capability.
There are currently two types of situations that VPR cannot handle:

  1. Capacity tiles that need specific pin_locations for each of the z location in which the tile is placed. As a concrete example, the BUFG tile in the Series7 comprehends 16 BUFGCTRL sites, each with the same I/Os and functionality. They could be expressed with a single tile, with capacity of 16. The issue though is that the pin_locations of some of the BUFGCTRL is not equal to the others (as visible in the image below).
    This PR is an initial step towards the solution of this issue.

Untitled Diagram (1)

  1. Heterogeneous tiles: there are a few kind of special tiles (still in the 7series) that comprehend different kinds of sites with different types of I/Os and different functionalities. A simple example of tiles composed by heterogeneous resources is the following:

Untitled Diagram

The final solution would ideally solve both of the above mentioned situations, as we could treat the BUFGCTRL sites not homogeneously as it happens now, but as an heterogeneous set, composed, with each element of the set having different pin_locations

Related Issue

This PR is to resume the discussion started here: #1063

Motivation and Context

How Has This Been Tested?

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@probot-autolabeler probot-autolabeler bot added lang-cpp C/C++ code libarchfpga Library for handling FPGA Architecture descriptions VPR VPR FPGA Placement & Routing Tool labels Mar 11, 2020
@acomodi
Copy link
Collaborator Author

acomodi commented Mar 11, 2020

@litghost @kmurray @vaughnbetz FYI.
I have some doubts on what the direction forward to achieve the solution to this issue would likely be. Before getting hands on the code it would be better to have a clear idea on how to tackle the problem.

@kmurray
Copy link
Contributor

kmurray commented Mar 11, 2020

Thanks @acomodi for the diagrams. They are helpful for grasping the issue, although I still don't think I fully comprehend the current limitations.

It seems to me like these options should simply be encoded as options within the pb_type at that X/Y location.

For example, in case 1, can you comment on why putting the 16 bufg intances within a single pb_type and using custom pin locations is insufficient?

<pb_type name="bufg_16">
    <input name="bufg1_in"/>
    <input name="bufg2_in"/>
    ....
    <input name="bufg16_in"/>
    <!--similarly for outputs-->


    <pb_type name="bufg" num_pb="16">
        ...
    </pb_type>
    ....
</pb_type>

<tile name="bufg_tile"  width="1" height="4">
...
    <pinlocations pattern="custom">
        <loc side="left" y_offset="0"> bufg_16.bufg1_in bufg_16.bufg2_in bufg_16.bufg3_in bufg_16.bufg4_in</loc>
        <loc side="left" y_offset="1"> bufg_16.bufg5_in bufg_16.bufg6_in bufg_16.bufg7_in bufg_16.bufg8_in</loc>
        ...
        <loc side="left" y_offset="3"> bufg_16.bufg13_in bufg_16.bufg14_in bufg_16.bufg15_in bufg_16.bufg16_in</loc>
    </pinlocations>
</tile>

@acomodi
Copy link
Collaborator Author

acomodi commented Mar 11, 2020

@kmurray Sure, probably I should have explained a bit more on the BUFG issue.

The fact is that we need to have the freedom to choose where a certain BUFG instance is going to be placed. This because the 7series for instance require special treatment of clock resources (e.g. BUFG and PLL must be in the same clock region).
To achieve this, each BUFG needs to be a separate entity that can be placed in one of the 16 placement locations.

Untitled Diagram (2)

@litghost
Copy link
Collaborator

For example, in case 1, can you comment on why putting the 16 bufg intances within a single pb_type and using custom pin locations is insufficient?

This is because that defines one cluster, rather than sixteen clusters. In the 7-series arch, there are two BUFGCTRL tiles, a top and a bottom. Some BUFG's must be in the top or bottom, some BUFG can be in either. In order to allow for the placer (or fixed placement constraints) to work, each BUFGCTRL must be in its own cluster, so they can be placed independently.

@kmurray
Copy link
Contributor

kmurray commented Mar 11, 2020

OK, so the desire is to support different pb_types (which may possibly be equivalent with other pb_types) at each 'capacity' location within a specific tile (i.e. physical routing interface). Is that correct?

@litghost
Copy link
Collaborator

OK, so the desire is to support different pb_types (which may possibly be equivalent with other pb_types) at each 'capacity' location within a specific tile (i.e. physical routing interface). Is that correct?

Yes.

@vaughnbetz
Copy link
Contributor

Discussion March 12:

  • General agreement that we want to be able to store heterogenous blocks at an (x,y). Alessandro to make a document proposing the architecture file representation and the grid representation. Vaughn's preference:
  • grid has a type / physical data for (x,y,index) values, where index at a certain x,y goes from 0 to capacity-1. This lets us have arbitrary heterogeneous blocks.
  • We also want to rename z to something else (blk_index? sub_block?) but that can be a separate update or merged into this discussion as appropriate.

@acomodi
Copy link
Collaborator Author

acomodi commented Mar 13, 2020

@acomodi
Copy link
Collaborator Author

acomodi commented Mar 17, 2020

@kmurray @vaughnbetz Ping

@acomodi acomodi force-pushed the heterogeneous-tiles branch from 582300b to 1cbd2df Compare March 25, 2020 20:18
@acomodi
Copy link
Collaborator Author

acomodi commented Mar 25, 2020

@litghost @kmurray @vaughnbetz FYI, I have started implementing the Heterogeneous tiles following the document's proposal.

DONE:

  • Pin Locations and Pin Classes have been refactored to have a more C++ style. I have changed it them to use vectors instead of plain arrays. This results in fixing things in multiple places.
  • Added SubTile parsing capability from the XML

TODO:

  • Fix compilation errors due to the pin locations and pin classes refactor
  • Remove equivalent_sites from the physical_tile_type
  • Change regression tests architecture to reflect the sub tile addition
  • Fix regression tests
  • Adjust placement to use sub tiles locations
  • Test against symbiflow architectures

This still does not require a review as it is a WIP still, I pushed this draft in case you want to have a look on the progress.

Copy link
Contributor

@kmurray kmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a couple comments below.

@@ -582,21 +583,15 @@ struct t_physical_tile_type {
int width = 0;
int height = 0;

bool**** pinloc = nullptr; /* [0..width-1][0..height-1][0..3][0..num_pins-1] */
std::vector<std::vector<std::vector<std::vector<bool>>>> pinloc; /* [0..width-1][0..height-1][0..3][0..num_pins-1] */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might consider using vtr::Matrix<std::vector> here (since the block width/height should be known when building the tile).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I have actually used an NdMatrix of dimension 3 as also the number of sides is known.

@@ -611,6 +606,8 @@ struct t_physical_tile_type {

int index = -1; /* index of type descriptor in array (allows for index referencing) */

std::vector<t_sub_tile> sub_tiles;

std::vector<t_logical_block_type_ptr> equivalent_sites;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably on the TODO list, but it would seem that this should be removed since it's now in the sub-tiles.

Comment on lines 226 to 229
int sub_tile_index = 0;
auto logical_pin = type->tile_block_pin_directs_map(physical_pin(pin_index));
sub_tile_index = logical_pin.sub_tile_index;
pin_index = logical_pin.pin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code shows up in a couple places. We should probably put it into a re-usable function which does this lookup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this piece of code actually won't work. The directs_map needs a logical block, which is not provided.

The problem here is to calculate the correct sub_tile based on a physical pin index. I think an additional member to the sub_tile defining the pin ranges so to easily calculate in which sub_tile the pin_index is located (and also change the math needed to get the pin index in the capacity instance.

@acomodi acomodi force-pushed the heterogeneous-tiles branch 2 times, most recently from 324d7fa to 681c32f Compare March 27, 2020 19:40
@probot-autolabeler probot-autolabeler bot added tests VTR Flow VTR Design Flow (scripts/benchmarks/architectures) labels Mar 27, 2020
@acomodi
Copy link
Collaborator Author

acomodi commented Mar 27, 2020

@kmurray I should have solved all the strong and basic regression tests issues (they do pass on my local machine). I believe this PR is now ready for a first review pass.

What is still needed though is correctly picking a sub tile coordinate that is compatible with the logic block type during placement.

@acomodi acomodi changed the title WIP: Add heterogeneous tiles capability Add Heterogeneous Tiles capability Mar 28, 2020
@acomodi acomodi force-pushed the heterogeneous-tiles branch from ecd6737 to e90b2af Compare March 31, 2020 14:54
@acomodi
Copy link
Collaborator Author

acomodi commented Mar 31, 2020

@kmurray I believe all the issues have been addressed and solved.

I have added a strong regression test as well to verify the sub tiles correctness.

Currently what is left to do to have a complete PR is to add the relative documentation for the sub tiles usage (maybe even a tutorial similar to the equivalent sites one).

@probot-autolabeler probot-autolabeler bot added the docs Documentation label Mar 31, 2020
@acomodi acomodi requested a review from kmurray March 31, 2020 15:30
@acomodi acomodi force-pushed the heterogeneous-tiles branch from 5ace6f1 to 7a7333c Compare March 31, 2020 15:41
@acomodi
Copy link
Collaborator Author

acomodi commented Mar 31, 2020

@kmurray I have also temporarily reverted the change on the architecture XMLs so to make code review easier. I will remove the revert commit once all the future review comments have been addressed.

Copy link
Contributor

@kmurray kmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good. I've included various comments below.

I agree adding a tutorial to the documentation would be good.

I think we'll also need to evaluate this for QoR/Run-time (e.g. on VTR and Titan benchmarks) to see if there are any performance issues (I noted a potential one in the comments below).

Comment on lines 640 to 643
//Each x/y location possibly contains multiple sub tiles, so we need to pick
//a z location within a compatible sub tile.
std::vector<int> compatible_sub_tiles;
for (int capacity = 0; capacity < to_type->capacity; capacity++) {
t_pl_loc loc(to.x, to.y, capacity);

if (is_tile_compatible(to_type, type, loc)) {
compatible_sub_tiles.push_back(capacity);
}
}
to.sub_tile = compatible_sub_tiles[vtr::irand((int)compatible_sub_tiles.size() - 1)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is allocating a a std::vector deep in the placer's inner loop, which is likely to be a significant performance issue.

Probably we'll need to pull this out and pre-determine what sub-tile locations are compatible with each logical type. I expect this will end-up in the compressed grid data structures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested I have moved the creation of the vector during create_compressed_block_grids.

To do so an unordered map is necessary, as it may be possible that a logical block is can be placed in different tiles with different capacities, hence, the sub tile index depends not only on the logical block, but also on the physical tile chosen for the swap.

This should reduce the impact on run time, and I will run the qor benchmarks to verify this.
If the performance hit is too much, we could just let the move generator produce a random sub tile index which is checked at a later stage out of the inner loop.

@@ -146,6 +148,8 @@ int get_max_num_pins(t_logical_block_type_ptr logical_block);

bool is_tile_compatible(t_physical_tile_type_ptr physical_tile, t_logical_block_type_ptr logical_block);

bool is_tile_compatible(t_physical_tile_type_ptr physical_tile, t_logical_block_type_ptr logical_block, t_pl_loc loc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the t_pl_loc is needed to see which capacity location the logical block is in.

If that is the case, we probably don't need the physical_tile as well (as it is also determined by the loc).

It may make sense to factor this out so there is a:

bool is_subtile_compatible(t_sub_tile sub_tile, t_logical_block_type_ptr logical_block);

which is_tile_compatible() calls internally.

Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was actually thought to be used in cases where the placement is not yet completed, to check whether specific locations were compatible (e.g. read_place.cpp, move_utils.cpp).

It works the other way around, meaning that, first, there is a check to test whether a specific sub tile index is compatible within a physical tile, and then, the logic block itself is checked as well, to see if that is compatible with the physical tile (by calling the original is_tile_compatible function.

Comment on lines 2220 to 2224
int get_physical_pin(int sub_tile_index,
t_physical_tile_type_ptr physical_tile,
t_logical_block_type_ptr logical_block,
int pin) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between these two versions of get_physical_pins()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, one is the superset of the other.

I have modified as follows;

  • The one with the sub_tile_index parameter is the one that performs the lookup to get the physical pin.
  • The other homonym function, as first thing gets the sub tile index based on which sub tile is compatible with the logical block, then it calls the get_physical_pin with the sub_tile_index parameter.

This was done mainly because now, in the logical pin <-> physical pin mapping there is the sub tile concept introduced. The logical pin now has a sub tile index.

During clustering there is the need to access the physical pin related to a logical one, but it is not important which exact sub tile is needed, as the physical pin is needed only to check the type of pin for instance.

The only important place where the sub tile index needs to be the correct one, is during the block sync at the end of placement, where the "real" physical pins are mapped against the logical pins.

@acomodi acomodi force-pushed the heterogeneous-tiles branch from cf0d77b to 9d5d4c9 Compare April 2, 2020 11:53
acomodi added 27 commits April 10, 2020 00:05
There still are two regression tests which are not passing

Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
This commit also increases the netlist size of the sub tiles test

Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
Signed-off-by: Alessandro Comodi <[email protected]>
@acomodi acomodi force-pushed the heterogeneous-tiles branch from 0df965d to 763528e Compare April 9, 2020 22:10
@kmurray kmurray merged commit 307047e into verilog-to-routing:master Apr 13, 2020
@kmurray
Copy link
Contributor

kmurray commented Apr 13, 2020

Thanks @acomodi for putting this together! This should be a great enhancement to the placer and our modelling capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code libarchfpga Library for handling FPGA Architecture descriptions tests VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants