Skip to content

Refactor rt node #1508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 31, 2020
Merged

Refactor rt node #1508

merged 15 commits into from
Aug 31, 2020

Conversation

helen-23
Copy link
Contributor

@helen-23 helen-23 commented Aug 27, 2020

[Edited]

In this pull request, I have refactored the rt_node and traceback structures, adding a new member in the structure to keep track of the net pin index (ranging from 1 to fanout) of sinks. I've also updated all parts of the code to properly store and access this new member.

Description

  • Fixed traceback_to_route_tree() function to create and connect separate rt_nodes if the same rr node index is hit more than once walking the traceback.
  • Added a new member in the rt_node and traceback structures to keep track of the net pin index of sinks (ranging from 1 to fanout).
  • When a new rt_node is created, made sure to add the node's associated net pin index to the new member if it is a sink node. If it's not a sink, -1 is stored. Same applies for traceback instances.
  • Updated various other functions to use this new member instead of using the rr_sink_node_to_pin mapping,
  • Modified print_route routine to print out sink net pin index to the router file (.route).
  • Modified load_routing routine to read in sink net pin index from the router file (.route) and store it in the traceback.

Related Issue

#1475

Motivation and Context

As part of a Stratix IV architecture experiment, we tried forcing a greedy and false pin-equivalence to the M9K, M144K, and DSP inputs (meaning that although equivalence is set to "full" on the top level input pins, the local routing of these blocks is still partially fully connected).

When running benchmarks with this experiment, VPR failed in routing because it was attempting to delete the same node twice from the route tree. This happened because the the clustered netlist builder for logic blocks mapped the same net to multiple pins in a port, resulting in multiple rt nodes being created with the same rr node index. However, the code is inconsistent: when building up the route tree from traceback, only one rt_node is created for each rr node index found in the traceback, so edges are created between the same two nodes multiple times; this is problematic during deletion.

This never happened in the past because pin equivalence had always been applied to the LABs. The LABs have proper pin equivalence set up in the architecture capture. In a fully equivalent port, each pin should be mapped to a unique port, so the clustered netlist builder doesn't connect a net more than once to a port if it goes to multiple equivalent inputs in this case.

While investigating the case, I found multiple other issues that would arise if multiple sink-type rt nodes share the same rr node index. Several parts of the routing code need to use the net pin index of a node, but the rt_node structure only retains the node's rr node index value, so the rr node index is translated into its corresponding net pin index through the rr_sink_node_to_pin mapping, which could only map a rr-node index to ONE of its corresponding net pin indices.

If the architecture capture is modified properly with true equivalent ports to the DSP and ram blocks, this problem can be avoided. However, this is still a bug that we should address. If we one day take different speed paths to the same sink for different target_pins, we want to associate the route tree branch with the target_pin we were trying to route to (and hence make sure we get the branch with the timing we expect).

One way to fix this problem is to make rr_sink_node_to_pin a multimap, but that still makes it difficult to determine which pin a particular rt_node corresponds to if its node index maps to several net pin indices.

The best way to fix this once and for all is to have net pin index stored in the rt_node structure as well, so no more mapping is required.

How Has This Been Tested?

  • Tried running large ML designs (e.g. DLA variants) through VPR with greedy and false pin-equivalence to the M9K, M144K, and DSP inputs, which succeeded with results that make sense.
  • Tried running large ML designs (e.g. DLA variants) through VPR with the existing SIV architecture capture, which succeeded with the same resource usage results as before. Fix did not increase VPR runtime.
  • Passed regression test and QoR checks

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@helen-23 helen-23 requested a review from vaughnbetz August 27, 2020 05:24
@probot-autolabeler probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Aug 27, 2020
@helen-23 helen-23 linked an issue Aug 27, 2020 that may be closed by this pull request
@helen-23
Copy link
Contributor Author

Hi Vaughn,
I have this branch ready for review. The only Travis CI failure is the sporadic Python Lint issue. Did not change any python scripts in this pull request. Thank you.

if (tokens[8 + offset] == "Net_pin_index:") {
net_pin_id = atoi(tokens[9 + offset].c_str());
} else {
vpr_throw(VPR_ERROR_ROUTE, filename, lineno,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help users with old .route files, please add something like: "Is this an old .route file without that information? If so, re-generate the routing."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added message in the vpr_throw.

* different pins, node index cannot uniquely identify *
* each sink, so the net pin index guarentees an unique *
* identification for each sink-type node. For non-sink- *
* type nodes and for sink-type nodes with no associated *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should say what SINK node wouldn't have such an index (special SINKs like the source of a clock tree which don't correspond to an actual netlist connection).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment. Also added the same comments for rt_node.

@@ -190,8 +190,7 @@ static void check_sink(int inode, ClusterNetId net_id, bool* pin_done) {
int ptc_num = device_ctx.rr_nodes[inode].ptc_num();
int ifound = 0;

for (int iblk = 0; iblk < type->capacity; iblk++) {
ClusterBlockId bnum = place_ctx.grid_blocks[i][j].blocks[iblk]; /* Hardcoded to one cluster_ctx block*/
for (auto bnum : place_ctx.grid_blocks[i][j].blocks) {
unsigned int ipin = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still baffled by the need for this code. It degenerates to a count of the pins connected, along with a check that some block on the net is at the (i,j) specified.
If you check out https://docs.verilogtorouting.org/en/latest/api/vpr/netlist/ you can see how the netlist works. We are storing a net_pin_index on each connection (SINK in the traceback). So if you pass that net_pin_index (which you now store) into check_sink, we can use it here. I think the code is basically:


pin_id = cluster_ctx.clb_nlist.net_pin (net_id, net_pin_index);   // Gets a pin on the netlist corresponding to this connection on the netlist.
for (auto bnum: place_ctr.grid_blocks[i][j].blocks) {   // Look through all blocks at the sink (i,j)
    if (cluster_ctx.clb_nlist.pin_block(pin_id) == bnum) {    // One should match the block at the end of this netlist connection
       VPR_ASSERT (!pin_done[net_pin_index]);    // Shouldn't have found a routed connection to it before
       pin_done[net_pin_index] = true;                 // Mark that now we have found a routed connection to it
       break;
   }
}

if (!pin_done[net_pin_index]) {
   VPR_FATAL_ERROR (... same message ...)
}

if

@@ -19,6 +19,10 @@ class Connection_based_routing_resources {
// each net maps SINK node index -> PIN index for net
// only need to be built once at the start since the SINK nodes never change
// the reverse lookup of route_ctx.net_rr_terminals
// be careful: it is possible for multiple sinks to share the same node index in some cases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably say a single SINK can map to multiple PIN index on a net (multiple connections) in some unusual cases. Hence use of this data structure is discouraged (instead use the pin index on a net stored in the rt_node) and this data structure will likely be removed in the future.

[What still uses this? Just the breadth-first router? If so, we should note that in the comment and it's a good reason for deleting the breadth-first router in the future.]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, nothing uses this anymore, not even the BF router. There is just one function that does a sanity check of the map, called by an assertion in route_timing.cpp, which no longer uses this anyways.

I can go ahead and remove this altogether.

@@ -86,7 +86,7 @@ static int num_linked_f_pointer_allocated = 0;
* */

/******************** Subroutines local to route_common.c *******************/
static t_trace_branch traceback_branch(int node, std::unordered_set<int>& main_branch_visited);
static t_trace_branch traceback_branch(int node, int target_pin, std::unordered_set<int>& main_branch_visited);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use a more descriptive variable name: target_net_pin_index would be better I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, while I'm at it, I also changed the parameter name in update_traceback, which is the routine a level above this.

@@ -316,6 +331,7 @@ add_subtree_to_route_tree(t_heap* hptr, t_rt_node** sink_rt_node_ptr) {

rt_node->u.child_list = linked_rt_edge;
rt_node->inode = inode;
rt_node->ipin = OPEN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain why: not valid for non-SINK nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -347,6 +363,7 @@ add_subtree_to_route_tree(t_heap* hptr, t_rt_node** sink_rt_node_ptr) {

//Expand (recursively) each of the main-branch nodes adding any
//non-configurably connected nodes
//Sink is not included, so no need to pass in the node's ipin value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any pin index passed in, so I don't understand the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, I sort of put this in as a reminder for myself that there is no need to pass in any net pin info to this function. But I agree it's confusing. I am taking it out.

@@ -19,7 +19,7 @@ void free_route_tree(t_rt_node* rt_node);
void print_route_tree(const t_rt_node* rt_node);
void print_route_tree(const t_rt_node* rt_node, int depth);

t_rt_node* update_route_tree(t_heap* hptr, SpatialRouteTreeLookup* spatial_rt_lookup);
t_rt_node* update_route_tree(t_heap* hptr, int target_pin, SpatialRouteTreeLookup* spatial_rt_lookup);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename target_pin to target_net_pin_index, and if you know what the routine does, write a comment above the function name.

Copy link
Contributor Author

@helen-23 helen-23 Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never added a comment because the description for these functions are in the .cpp. But I've just added to the comment to describe what the target_net_pin_index parameter means before the function definition in the .cpp. If you think it's a good convention to add it in the .h as well, I can do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call; you can keep following the .cpp convention for these files if you like.

@@ -27,6 +27,13 @@ struct t_linked_rt_edge {
* parent_switch: Index of the switch type driving this node (by its *
* parent). *
* inode: index (ID) of the rr_node that corresponds to this rt_node. *
* ipin: Net pin index associated with the rt_node. This value ranges from *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

net_pin_index would be a better name than ipin, both here and in t_trace, in retrospect. The comment is good; you could change the member name too (here and in t_trace) if you have the energy for it :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and done.

inode_to_Tdel_map[node->inode] = node->Tdel; // add to the map, process current node
/* This routine recursively traverses the route tree, and copies the Tdel of the sink_type nodes *
* into the map. */
if (node->ipin != OPEN) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that this means non-SINK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done.

@helen-23
Copy link
Contributor Author

Hi Vaughn,
I marked most of the above as done just so I can keep track of which ones I've changed locally, though I do not currently have anything new committed yet. I also replied to some o your comments with questions as well. Thank you.

@helen-23
Copy link
Contributor Author

helen-23 commented Aug 28, 2020

The refactored code increased memory usage by less than 1% for all tests conducted. VPR runtime was not effected (placement time, pack time, and route time fluctuates in every run depending on the machine's load). The following figures are measured in MiB.

image

@vaughnbetz
Copy link
Contributor

Code format failed Helen; can you run make format?
pylint failed too (unrelated, Shad is fixing that so hopefully works after rebase but not your issue anyway).

@vaughnbetz
Copy link
Contributor

Thanks for the QoR checks; those look good.

@helen-23
Copy link
Contributor Author

Experiment: re-run routing (just routing and analysis) with astar_fac=0.5
This did improve Fmax for the DLA variants as it does not reach for the bottom-left corner as rigorously, especially for DLA_BSC, which had the lowest Fmax. I think it is reasonable to hypothesize that once the sink coordinates are refactored, it would improve routing decisions. Here is a comparison between the Fmax results:
image

@vaughnbetz
Copy link
Contributor

Great news Helen -- thanks.
The Travis CI failure is an unrelated python lint failure (Shad already fixed that on the master).
Once kokoro finishes (assuming all goes well) I'll merge this.

@vaughnbetz vaughnbetz merged commit c93941d into master Aug 31, 2020
@vaughnbetz vaughnbetz deleted the refactor_rt_node branch August 31, 2020 16:14
@vaughnbetz
Copy link
Contributor

Everything looks good (except the spurious pylint failure. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VPR Memory Error when input pin equivalence applied to DSPs and RAMs
2 participants