Skip to content

Chan z prefix sum #2781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Nov 6, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
400b5a1
[vpr][place] add acc_tile_num_inter_die_conn
amin1377 Oct 18, 2024
4018188
[vpr][place] use prefix sum to populate chanz_place_cost_fac_
amin1377 Oct 18, 2024
0432740
Merge branch 'master' of https://github.com/verilog-to-routing/vtr-ve…
amin1377 Oct 30, 2024
11ee10f
[vpr][place] initialize other entried of acc_tile_num_inter_die_conn
amin1377 Oct 30, 2024
8bead96
[vpr][place][net_cost] add get_chanz_cost_factor signiture and acc_ti…
amin1377 Oct 30, 2024
bea400e
[vpr][place][net_cost] get_chanz_cost_factor impl
amin1377 Oct 30, 2024
01b9cd6
[vpr][place][net_cost] remove place_cost_exp from functions arguments
amin1377 Oct 30, 2024
af5659d
[vpr][place][net_cost] fix num_inter_dir_conn corner cases
amin1377 Oct 30, 2024
736d826
[vpr][place][net_cost] call get_chanz_cost_factor instead of chanz_pl…
amin1377 Oct 30, 2024
1a3e56d
[vpr][place][net_cost] remove chanz_place_cost_fac_ calculation
amin1377 Oct 30, 2024
54fe8d7
[vpr][place][net_cost] recomment on how to calculate acc_tile_num_int…
amin1377 Oct 30, 2024
df4159b
[vpr][place] fix typos + unify edge loops
amin1377 Nov 5, 2024
21c62d8
[vpr][place]make get_chanz_cost_factor a private function
amin1377 Nov 5, 2024
da92578
[vpr][place] add is_multi_layer_ to net cost handler fields
amin1377 Nov 5, 2024
052d6b9
[vpr][place] factor out crossing multiplication
amin1377 Nov 5, 2024
e15a796
[vpr][place][net_cost] use bb
amin1377 Nov 5, 2024
f2939b1
[vpr][place][net_cost] fix typos
amin1377 Nov 5, 2024
e213fa1
Merge branch 'master' of https://github.com/verilog-to-routing/vtr-ve…
amin1377 Nov 5, 2024
a0b2c06
[ci] update golden
amin1377 Nov 6, 2024
6ea8921
[ci] update odin golden
amin1377 Nov 6, 2024
596ddaf
Merge branch 'master' of https://github.com/verilog-to-routing/vtr-ve…
amin1377 Nov 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 73 additions & 24 deletions vpr/src/place/net_cost_handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -149,10 +149,11 @@ NetCostHandler::NetCostHandler(const t_placer_opts& placer_opts,
* been recomputed. */
bb_update_status_.resize(num_nets, NetUpdateState::NOT_UPDATED_YET);

alloc_and_load_chan_w_factors_for_place_cost_(placer_opts_.place_cost_exp);
alloc_and_load_chan_w_factors_for_place_cost_();
}

void NetCostHandler::alloc_and_load_chan_w_factors_for_place_cost_(float place_cost_exp) {
void NetCostHandler::alloc_and_load_chan_w_factors_for_place_cost_() {
float place_cost_exp = placer_opts_.place_cost_exp;
auto& device_ctx = g_vpr_ctx.device();

const int grid_height = device_ctx.grid.height();
Expand Down Expand Up @@ -229,22 +230,32 @@ void NetCostHandler::alloc_and_load_chan_w_factors_for_place_cost_(float place_c
}

if (device_ctx.grid.get_num_layers() > 1) {
alloc_and_load_for_fast_vertical_cost_update_(place_cost_exp);
alloc_and_load_for_fast_vertical_cost_update_();
}
}

void NetCostHandler::alloc_and_load_for_fast_vertical_cost_update_(float place_cost_exp) {
void NetCostHandler::alloc_and_load_for_fast_vertical_cost_update_() {
const auto& device_ctx = g_vpr_ctx.device();
const auto& rr_graph = device_ctx.rr_graph;

const size_t grid_height = device_ctx.grid.height();
const size_t grid_width = device_ctx.grid.width();


chanz_place_cost_fac_ = vtr::NdMatrix<float, 4>({grid_width, grid_height, grid_width, grid_height}, 0.);
acc_tile_num_inter_die_conn_ = vtr::NdMatrix<int, 2>({grid_width, grid_height}, 0.);

vtr::NdMatrix<float, 2> tile_num_inter_die_conn({grid_width, grid_height}, 0.);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does tile_num_inter_die_conn need to store float values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a count, so int is probably better. Float could have some round-off issues with big chips, as this matrix is filled in by adding small numbers to a running count, which can get problematic if the big number ever becomes more than 16 million times or so bigger than the small number (the small number then gets thrown away by round off).


/*
* To calculate the accumulative number of inter-die connections we first need to get the number of
* inter-die connection per loaction. To be able to work for the cases that RR Graph is read instead
* of being made from the architecture file, we calculate this number by iterating over RR graph. Once
* tile_num_inter_die_conn is populated, we can start populating acc_tile_num_inter_die_conn_. First,
* we populate the first row and column. Then, we iterate over the rest of blocks and get the number of
* inter-die connections by adding up the number of inter-die block at that location + the accumulative
* for the block below and left to it. Then, since the accumulative number of inter-die connection to
* the block on the lower left connection of the block is added twice, that part needs to be removed.
*/
for (const auto& src_rr_node : rr_graph.nodes()) {
for (const auto& rr_edge_idx : rr_graph.configurable_edges(src_rr_node)) {
const auto& sink_rr_node = rr_graph.edge_sink_node(src_rr_node, rr_edge_idx);
Expand All @@ -271,24 +282,24 @@ void NetCostHandler::alloc_and_load_for_fast_vertical_cost_update_(float place_c
}
}

for (int x_high = 0; x_high < (int)device_ctx.grid.width(); x_high++) {
for (int y_high = 0; y_high < (int)device_ctx.grid.height(); y_high++) {
for (int x_low = 0; x_low <= x_high; x_low++) {
for (int y_low = 0; y_low <= y_high; y_low++) {
int num_inter_die_conn = 0;
for (int x = x_low; x <= x_high; x++) {
for (int y = y_low; y <= y_high; y++) {
num_inter_die_conn += tile_num_inter_die_conn[x][y];
}
}
int seen_num_tiles = (x_high - x_low + 1) * (y_high - y_low + 1);
chanz_place_cost_fac_[x_high][y_high][x_low][y_low] = seen_num_tiles / static_cast<float>(num_inter_die_conn);

chanz_place_cost_fac_[x_high][y_high][x_low][y_low] = pow(
(double)chanz_place_cost_fac_[x_high][y_high][x_low][y_low],
(double)place_cost_exp);
}
}
acc_tile_num_inter_die_conn_[0][0] = tile_num_inter_die_conn[0][0];
// Initialize the first row and column
for (size_t x = 1; x < device_ctx.grid.width(); x++) {
acc_tile_num_inter_die_conn_[x][0] = acc_tile_num_inter_die_conn_[x-1][0] + \
tile_num_inter_die_conn[x][0];
}

for (size_t y = 1; y < device_ctx.grid.height(); y++) {
acc_tile_num_inter_die_conn_[0][y] = acc_tile_num_inter_die_conn_[0][y-1] + \
tile_num_inter_die_conn[0][y];
}

for (size_t x_high = 1; x_high < device_ctx.grid.width(); x_high++) {
for (size_t y_high = 1; y_high < device_ctx.grid.height(); y_high++) {
acc_tile_num_inter_die_conn_[x_high][y_high] = acc_tile_num_inter_die_conn_[x_high-1][y_high] + \
acc_tile_num_inter_die_conn_[x_high][y_high-1] + \
tile_num_inter_die_conn[x_high][y_high] - \
acc_tile_num_inter_die_conn_[x_high-1][y_high-1];
}
}
}
Expand Down Expand Up @@ -1478,7 +1489,7 @@ double NetCostHandler::get_net_cube_bb_cost_(ClusterNetId net_id, bool use_ts) {
ncost = (bb.xmax - bb.xmin + 1) * crossing * chanx_place_cost_fac_[bb.ymax][bb.ymin - 1];
ncost += (bb.ymax - bb.ymin + 1) * crossing * chany_place_cost_fac_[bb.xmax][bb.xmin - 1];
if (is_multi_layer) {
ncost += (bb.layer_max - bb.layer_min) * crossing * chanz_place_cost_fac_[bb.xmax][bb.ymax][bb.xmin][bb.ymin];
ncost += (bb.layer_max - bb.layer_min) * crossing * get_chanz_cost_factor(bb);
}

return ncost;
Expand Down Expand Up @@ -1581,6 +1592,44 @@ double NetCostHandler::get_net_wirelength_from_layer_bb_(ClusterNetId net_id) {
return ncost;
}

float NetCostHandler::get_chanz_cost_factor(const t_bb& bounding_box) {
float place_cost_exp = placer_opts_.place_cost_exp;
int x_high = bounding_box.xmax;
int x_low = bounding_box.xmin;
int y_high = bounding_box.ymax;
int y_low = bounding_box.ymin;

int num_inter_dir_conn;

if (x_low == 0 && y_low == 0) {
num_inter_dir_conn = acc_tile_num_inter_die_conn_[x_high][y_high];
} else if (x_low == 0) {
num_inter_dir_conn = acc_tile_num_inter_die_conn_[x_high][y_high] - \
acc_tile_num_inter_die_conn_[x_high][y_low-1];
} else if (y_low == 0) {
num_inter_dir_conn = acc_tile_num_inter_die_conn_[x_high][y_high] - \
acc_tile_num_inter_die_conn_[x_low-1][y_high];
} else {
num_inter_dir_conn = acc_tile_num_inter_die_conn_[x_high][y_high] - \
acc_tile_num_inter_die_conn_[x_low-1][y_high] - \
acc_tile_num_inter_die_conn_[x_high][y_low-1] + \
acc_tile_num_inter_die_conn_[x_low-1][y_low-1];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using vtr::NdOffsetMatrix will be consistent with thest of file and will get rid of this if statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soheilshahrouz, if you remember, my initial attempt was to use this data structure, but I encountered some memory issues with it. I’ll create an issue for this and, in another PR, attempt to replace this data structure with vtr::NdOffsetMatrix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I figured out what the problem was.
Check if 68ecb55 sovles the memory issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll try it once the PR is merged.


int bb_num_tiles = (x_high - x_low + 1) * (y_high - y_low + 1);

float z_cost_factor;
if (num_inter_dir_conn == 0) {
return 1.0f;
} else {
z_cost_factor = bb_num_tiles / static_cast<float>(num_inter_dir_conn);
z_cost_factor = pow((double)z_cost_factor, (double)place_cost_exp);
}

return z_cost_factor;

}

double NetCostHandler::recompute_bb_cost_() {
double cost = 0;

Expand Down
30 changes: 17 additions & 13 deletions vpr/src/place/net_cost_handler.h
Original file line number Diff line number Diff line change
Expand Up @@ -196,12 +196,12 @@ class NetCostHandler {
vtr::NdOffsetMatrix<float, 2> chanx_place_cost_fac_; // [-1...device_ctx.grid.width()-1]
vtr::NdOffsetMatrix<float, 2> chany_place_cost_fac_; // [-1...device_ctx.grid.height()-1]
/**
@brief This data structure functions similarly to the matrices described above
but is applied to 3D connections linking different FPGA layers. It is used in the
placement cost function calculation, where the height of the bounding box is divided
by the average number of inter-die connections within the bounding box.
@brief This data structure stores the cumulative number of inter-die connections from the lower-left corner.
* It is later used to calculate the chanZ factor, which functions similarly to chanx_place_cost_fac_ and chany_place_cost_fac_,
* but applies to the height of the bounding box. The chanZ factor is calculated during block placement because storing it in the
* same way as the X and Y cost factors would require a 4D array and population it is an O(n^2) operation.
*/
vtr::NdMatrix<float, 4> chanz_place_cost_fac_; // [0...device_ctx.grid.width()-1][0...device_ctx.grid.height()-1][0...device_ctx.grid.width()-1][0...device_ctx.grid.height()-1]
vtr::NdMatrix<int, 2> acc_tile_num_inter_die_conn_;


private:
Expand Down Expand Up @@ -250,23 +250,17 @@ class NetCostHandler {
* have to bother calling this routine; when using the cost function described above, however, you must always
* call this routine before you do any placement cost determination. The place_cost_exp factor specifies to
* what power the width of the channel should be taken -- larger numbers make narrower channels more expensive.
*
* @param place_cost_exp It is an exponent to which you take the average inverse channel capacity;
* a higher value would favour wider channels more over narrower channels during placement (usually we use 1).
*/
void alloc_and_load_chan_w_factors_for_place_cost_(float place_cost_exp);
void alloc_and_load_chan_w_factors_for_place_cost_();

/**
* @brief Allocates and loads the chanz_place_cost_fac array with the inverse of
* the average number of inter-die connections between [subhigh] and [sublow].
*
* @details This is only useful for multi-die FPGAs. The place_cost_exp factor specifies to
* what power the average number of inter-die connections should be take -- larger numbers make narrower channels more expensive.
*
* @param place_cost_exp It is an exponent to which you take the average number of inter-die connections;
* a higher value would favour areas with more inter-die connections over areas with less of those during placement (usually we use 1).
*/
void alloc_and_load_for_fast_vertical_cost_update_(float place_cost_exp);
void alloc_and_load_for_fast_vertical_cost_update_();

/**
* @brief Calculate the new connection delay and timing cost of all the
Expand Down Expand Up @@ -511,4 +505,14 @@ class NetCostHandler {
*/
double get_net_wirelength_from_layer_bb_(ClusterNetId net_id);

/**
* @brief Calculate the chanz cost factor based on the inverse of the average number of inter-die connections
* in the given bounding box. This cost factor increases the placement cost for blocks that require inter-layer
* connections in areas with, on average, fewer inter-die connections. If inter-die connections are evenly
* distributed across tiles, the cost factor will be the same for all bounding boxes.
* @param bounding_box Bounding box of the net which chanz cost factor is to be calculated
* @return ChanZ cost factor
*/
float get_chanz_cost_factor(const t_bb& bounding_box);

};
Loading