-
Notifications
You must be signed in to change notification settings - Fork 415
Merging RCV branch into master #1395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing much better performance
…ll as extra debugging helps
Thanks David. |
vpr/src/route/route_path_manager.h
Outdated
/* Extra path data needed by RCV, seperated from t_heap struct for performance reasons | ||
* Can be accessed by a pointer, won't be initialized unless needed | ||
* | ||
* path_rr: The entire partial path up until the route tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what this is. Entire partial path until we reach either the net SOURCE or a part of the routing tree that has already been created for other connections for this net.
vpr/src/base/read_options.cpp
Outdated
@@ -2009,7 +2013,7 @@ argparse::ArgumentParser create_arg_parser(std::string prog_name, t_options& arg | |||
" * criticality: Sets the minimum budgets to 0 and the maximum budgets as a function of delay and criticality (net delay/ pin criticality) [EXPERIMENTAL]\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This help on options doesn't seem to match the option values below. Can you update it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, a simple next PR could be to give RCV it's own distinct option in the router, but for now I'll update this.
@@ -1017,6 +1017,7 @@ enum e_routing_failure_predictor { | |||
}; | |||
enum e_routing_budgets_algorithm { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment here on what this controls. Either describe the options or refer back to command option help.
PathManager(); | ||
~PathManager(); | ||
|
||
// Checks if the target node exists in the route tree, or current backwards path variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a comment that explains how RCV can create loops, and that two tests are necessary to prevent loops (check the partial route created for other connections on the net (route_tree_set) and check the prior nodes on the path that is being expanded/explored for a given heap element (partial path) for the current connection. This latter part needs to be stored for every partial path (heap element).
If you don't have such an overview comment, you should add it, probably near the top of route_path_manager.h.
Did a quick review of the code, which generally looks good. I added a few requests for comments / command line help above. |
@vaughnbetz Hi, as per the Thold values for RCV on and off, they are in the spreadsheet under the Formatted Raw Data section. I included both total hold and worst case hold slacks for each test. Sorry that wasn't noted somewhere! As per the failures on larger circuits, I can look into whats causing them for a future pull request. Thanks for looking over this, I'll make the changes you suggested right now. |
vpr/src/base/vpr_types.h
Outdated
MINIMAX, | ||
YOYO, | ||
MINIMAX, // Use MINIMAX-PERT algorithm to allocate budgets | ||
YOYO, // Use MINIMAX as above, and enable RCV algorithm to resolve negative hold slack | ||
SCALE_DELAY, | ||
DISABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should comment DISABLE too as it's an important one: currently turns off RCV and delay budgets and uses default router.
* Having these in t_heap creates significant performance issues when RCV is disabled | ||
* A t_heap_path pointer is instead stored in t_heap, which is selectively allocated only when RCV is enabled | ||
* | ||
* If the _is_enabled flag is true, alloc_path_struct allocates t_heap_path structures, otherwise will be a NOOP */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice comment -- thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, new comments look good. One more request embedded above.
Do you know if the circuits that fail with RCV run out of memory or unroute? Good to describe that here, as well as the experiment conditions (e.g. I think the vtr circuits channel width was 300 or 400 with RCV, but am not sure of the value).
@vaughnbetz For the QOR reg tests, the two circuits that failed for RCV ON were LU8PEEng and LU32PEEng, which didn't have a memory issue. They failed because there was still negative hold slack, and RCV didn't detect that this was probably being caused by global/ignored nets. This is a tunable fix, because I do have a function that is supposed to detect this. For the Titan tests the following circuits are reported as failed for the following reasons: Gaussian blur: This failed because I stopped it early as it was taking all weekend, I did keep it running and it actually passed. I can update the doc to include this. LU_Network: This failed the following assertion:
I haven't encountered this before. This appears to fail during the route budgeting stage with a weird call to tatum. This seems like it's probably an easy fix... I will look into this right now By the way I'm going to update the golden results for strong_bidir tests. |
Thanks David.
|
This looks ready to merge to me. I have some qor questions above, but none seem to be merge gating. |
@vaughnbetz To address your 4 concerns:
|
Thanks David. 3 is probably resolvable: basically if the routing is legal we should report success, whether Th is met or not. It's fine to have RCV keep trying to resolve hold, but the final condition of whether this routing is legal or not should understand it is legal and just move on to reporting. Hopefully the code for that is simple / clean. I agree 4 sounds more involved; sounds like something that can wait. |
Should be clear: 3 and 4 should both be left to future PRs. I just think 3 is a lot easier than 4. |
@vaughnbetz thanks for getting back to me, sounds good I'll note these two issues in the document if I don't get around to fixing them soon. I might move to patch 3 right after this. |
I updated the golden results for the vtr_reg_nightly/vtr_bidir regression test because it was performing differently. Let me know if you if you disagree with this |
Updating that QoR test looks OK to me. |
Read the docs failed but I can't see how that can possibly be related as no documentation files were changed in this PR. Keith confirmed in the meeting today that he's happy with the new code, as am I. Nice work pulling this together David! |
Description
This PR is a work in progress and still has a significant overhead on top of the regular VPR flow
Related Issue
Motivation and Context
How Has This Been Tested?
I tested this in a variety of scenarios on both the titan benchmark suite, as well as qor reg chain. Results of the data are located here:
UPDATED TITAN: https://docs.google.com/spreadsheets/d/10omM-uNtj-8o-3DnEfGh7C6fznX8va63Y85yTmocNqE/edit?usp=sharing
UPDATED QOR CHAIN: https://docs.google.com/spreadsheets/d/1pJdYBgbMMMudm0-0STI7htba0uHOeQ7VXBu1NcMzAME/edit?usp=sharing
As well I analyzed the results of which in a document here:
https://docs.google.com/document/d/1mZoHqO3H5mYvD44dt2-Ai-eIc4rlrHm4dhSMOlLKmqo/edit?usp=sharing
In summary, when RCV is turned off there should be minimal runtime impact due to the extra if statements in the core router code. I've tried to minimize this in a number of ways.
When RCV is turned on, there is a more significant impact on runtime, which is mostly caused by the route budgeting. However even in a lot of beyond worst case hold cases, it still is able to resolve intercluster hold violations mostly. There is still work to be done for larger circuits, as well as possibly dealing with intra cluster hold violations.
Types of changes
Checklist: