Skip to content

Remove t_traceback and update route_tree_timing #2212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 12, 2023
Merged

Conversation

duck2
Copy link
Contributor

@duck2 duck2 commented Nov 29, 2022

Currently there are two data structures to keep track of the routing: traceback (a linked list representing a tree) and route_tree_timing (a tree). This PR changes the router to use the route tree only and also make some updates to the route tree.

Motivation and Context

Both traceback and route_tree_timing are using global free lists, and my attempts to make them work with multiple threads are resulting in some really bad quality code. The main goal is to fix that memory management issue, but there will be a lot of code cleanup in the process as well.

How Has This Been Tested?

ran titan_quick_qor once so far

(todo: comments, attach QoR, pass CI)

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@duck2 duck2 changed the title Remove t_traceback and update route_tree_timing [WIP] Remove t_traceback and update route_tree_timing Nov 29, 2022
@duck2 duck2 marked this pull request as draft November 29, 2022 17:28
@github-actions github-actions bot added build Build system lang-make CMake/Make code VPR VPR FPGA Placement & Routing Tool labels Nov 29, 2022
@duck2 duck2 removed build Build system lang-make CMake/Make code labels Nov 29, 2022
@duck2 duck2 force-pushed the no_traceback branch 2 times, most recently from 8261385 to bed49ed Compare April 24, 2023 07:40
@duck2 duck2 changed the title [WIP] Remove t_traceback and update route_tree_timing Remove t_traceback and update route_tree_timing May 1, 2023
@duck2 duck2 marked this pull request as ready for review May 1, 2023 19:25
@duck2 duck2 requested review from amin1377 and vaughnbetz May 2, 2023 15:24
@amin1377
Copy link
Contributor

amin1377 commented May 9, 2023

I have a question, but I haven't finished reviewing the code yet. Is there a reason why you opted for tl:optional instead of std::optional?

* therefore store the last rt_node created of all the SINK nodes with the same
* index "inode". This is okay because the mapping is only used in this file to
* quickly figure out where rt_nodes that we are branching off of (for nets with
* fanout > 1) are, and we will never branch off a SINK. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it may be helpful to add to comments why the same SINK may be added to RT several times (for example, in the case that multiple connections of a net is connected to the same LUT)

@duck2
Copy link
Contributor Author

duck2 commented May 10, 2023

I have a question, but I haven't finished reviewing the code yet. Is there a reason why you opted for tl:optional instead of std::optional?

std::optional doesn't permit optional references due to a disagreement about how assignment would work. tl::optional works with references and rebinds on assignment in that case, similar to how pointers behave.

So optional<T&> is more or less a pointer with two big signs on it: "I can be null" and "I'm an alias to an existing object and not an output parameter or array or something you can free". This helped a lot when refactoring the route tree code.

@duck2 duck2 force-pushed the no_traceback branch 2 times, most recently from a6acfbd to 129ecc5 Compare May 30, 2023 06:02
@duck2
Copy link
Contributor Author

duck2 commented Jun 8, 2023

This should be mergeable if CI is green: added docs, updated golden results, and from my experience clicking around in the GUI, the patch doesn't make it crash right away (:

titan_quick_qor results (routing only, with the same placement): master_rt2_9.xlsx

Relevant rows from the summary:

Metric latest no_traceback
vtr_flow_elapsed_time 1 0,97745
max_vpr_mem 1 0,99438
routed_wirelength 1 1,00005
critical_path_delay 1 0,99941
geomean_nonvirtual_intradomain_critical_path_delay 1 0,99936
crit_path_route_time 1 0,96881

@duck2
Copy link
Contributor Author

duck2 commented Jun 8, 2023

The failures are due to min_chan_width_route_time (14.75 vs 14.0). Shouldn't be an issue? @vaughnbetz

@vaughnbetz
Copy link
Contributor

Titan Quick QoR looks good. I'm not worried about the one failure:

k4_n4_v7_bidir.xml/styr.blif/common min_chan_width relative value 1.3076923076923077 outside of range [0.25,1.3] and not equal to golden value: 13.0

That circuit is failing sometimes and it's a tiny one. I don't actually see the min_chan_width route time failure, but it wouldn't be a concern anyway.

Can you also run the VTR benchmarks just to be very safe and ensure no degradation on them?

@duck2
Copy link
Contributor Author

duck2 commented Jun 9, 2023

vtr_reg_qor_chain results: master_rt2_vtr_1.xlsx

My abc took 3x longer for some reason, but number of blocks seem to be equal.

Not sure what causes the fluctuations in min_chan_width_route_time. crit_path_route_time seems to be always close.

Metric latest no_traceback
vtr_flow_elapsed_time 1 1,19899
abc_depth 1 1
abc_synth_time 1 3,04310
num_clb 1 1
num_memories 1 1
num_mult 1 1
max_vpr_mem 1 1,03332
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 0,97867
placed_wirelength_est 1 1
place_time 1 0,92791
placed_CPD_est 1 1
min_chan_width 1 1,00565
routed_wirelength 1 0,99736
min_chan_width_route_time 1 1,05809
crit_path_routed_wirelength 1 0,99712
critical_path_delay 1 0,99898
geomean_nonvirtual_intradomain_critical_path_delay 1 0,99898
crit_path_route_time 1 0,98386

@vaughnbetz
Copy link
Contributor

Looks good, thanks. Maybe abc was built in debug mode? In any case no related to your changes so OK.

min_chan_width route time is a bit more chaotic than the crit_path_route time as you get variation in exactly when/if certain routings fail. Routings that barely fail or barely succeed tend to be the slowest, and dominate run time. Increases the noise margin.

@duck2
Copy link
Contributor Author

duck2 commented Jun 9, 2023

OK to merge in that case?

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely commented! (And I don't say that very often on a big review like this :).

@@ -96,6 +96,7 @@
#include "iostream"

#ifdef VPR_USE_TBB
# define TBB_PREVIEW_GLOBAL_CONTROL 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should comment why we need this (even if it is your name and statement that this was added to avoid compiler errors).

@@ -24,7 +24,7 @@ class Connection_based_routing_resources {
// contains rt_nodes representing sinks reached legally while pruning the route tree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment slightly out of date? Storing rr_node not rt_nodes? Update comment or delete this or something. Traceback comment also doesn't work. Might also be a TODO to get rid of this and use your new lookup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? If it doesn't serve any purpose we should fix or delete.

void pathfinder_update_single_node_occupancy(int inode, int add_or_sub);

void pathfinder_update_acc_cost_and_overuse_info(float acc_fac, OveruseInfo& overuse_info);

float update_pres_fac(float new_pres_fac);
/** Update pathfinder cost of all nodes rooted at rt_node, including rt_node itself */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rt_node -> change to root

t_rt_node* rt_root,
t_rt_node** rt_node_of_sink,
RouteTree& tree,
std::vector<vtr::optional<const RouteTreeNode&>>& rt_node_of_sink,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the rt_node_of_sink lookup actually does anything ; fix or delete.

NetPinsMatrix<float>& net_delay,
std::vector<float>& pin_criticality,
std::vector<vtr::optional<const RouteTreeNode&>>& rt_node_of_sink,
ClbNetPinsMatrix<float>& net_delay,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClbNetPinsMatrix --> NetPinsMatrix (rename)? Due to flat router.

*
* Congested paths in a tree can be pruned using RouteTree::prune(). Note that updates to a tree require an update to the global occupancy state via
* pathfinder_update_cost_from_route_tree(). In addition, RouteTree::prune() depends on this global data to find congestions, so the flow to
* prune a tree looks like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"prune a tree to keep only only the legally routed part of the tree looks like this:"

* }
*
* RouteTree::find_by_rr_id() finds the RouteTreeNode for a given RRNodeId. Note that RRNodeId isn't a unique
* key for SINK nodes and therefore an external lookup (generated from sink nodes returned by RouteTree::update_from_heap())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Note that if a single net routes to the same RR_SINK multiple times (which can happen in some architectures) this find_by_rr_id() lookup will just return the last RouteTree node that connected to this RR_SINK. If that isn't good enough, an external lookup generated from the sink nodes returned by RouteTree::update_from_heap() will be needed or you can search through the whole RouteTree and look at the inode and net_pin_index of each RouteTreeNode to find the RouteTreeNode that matches."

* Note that update_from_heap already does this, but prune() doesn't */
void reload_timing(vtr::optional<RouteTreeNode&> from_node = vtr::nullopt);

/** Get the RouteTreeNode corresponding to the RRNodeId. Returns nullopt if not found. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this rr_node is in the tree multiple times, it returns the last one that was added to the tree.

@vaughnbetz
Copy link
Contributor

nightly1: small design, cpu time out of bounds slightly for min chan width (not a real problem).
Same for strong_odin : min chan width route is out of bounds slightly. OK, but weird it sometimes fails and sometimes doesn't (non-determinism).
OK to merge, update comments after.
Also @duck2 wants to move rr_node_to_rt_node_of_sink mapping into the RouteTree. Do that in a separate PR.

@vaughnbetz vaughnbetz merged commit 2f9b456 into master Jun 12, 2023
@vaughnbetz vaughnbetz deleted the no_traceback branch June 12, 2023 17:17
void free_trace_data(t_trace* tptr);

/* Builds a skeleton route tree from a traceback
* does not calculate R_upstream, C_downstream, or Tdel (left uninitialized)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does. should update that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libvtrutil VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants