-
Notifications
You must be signed in to change notification settings - Fork 415
Print out info/report on overused rr-nodes at the end of a failed routing iteration #1455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print out info/report on overused rr-nodes at the end of a failed routing iteration #1455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Bill-hbrhbr @vaughnbetz
I have gone through the codes. It looks clean to me but it requires some fine-tuning.
On the other sides, I am thinking about if we should also generate a report file for these overused nodes?
- If the number of overused nodes is far more than the max value (default is 20), we have limited access to the information. Then we have to rerun routing to get more information, which could be very time consuming (think about the complexity of routing algorithms). Therefore, a report file may work well in this scenario.
- Should we also report which nets caused the overuse on the nodes? If I remember correctly, the routing traces are stored whatever routing is successful or not. We can walk through the routing traces and find which nets are mapped to the same node. Not sure if this may help debug or not. I used this when debugging the routing results in packer.
vpr/src/route/route_timing.cpp
Outdated
@@ -1669,6 +1675,107 @@ static void print_route_status(int itry, double elapsed_sec, float pres_fac, int | |||
fflush(stdout); | |||
} | |||
|
|||
static void print_overused_nodes_status(const t_router_opts& router_opts, const OveruseInfo& overuse_info) { | |||
//Display upper limit | |||
if (int(overuse_info.overused_nodes) > router_opts.max_reported_overused_rr_nodes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that if the number of overused nodes is larger than the threshold, we will do nothing.
Should we still report the overuse status and leave a message that the listed nodes are limited by the thresholds set by users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll log the overuse info if the number is small as well as produce an overuse info report file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That would be perfect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you want all the nets at the congested node as well? Do you want the report to be indexed by the congested nets or the overused RR nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Showing all the congested nets will be helpful. I prefer to be indexed by the overused RR Nodes, where we can see what nets share the same the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the nets to the report. You can checkout the report by specifying --generate_rr_node_overuse_report on. I will deal with the strong Id issue soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the congested nets is a good idea. In addition to printing their ids, I think we should also print their names and the type of driving block. E.g.
net #83 (processor|alu|reset) driven by a block of type CLB
as that can help locate them in the netlist and see patterns faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot about printing more detailed info on the nets. Working on this right now
… Also print a message if the # of overused nodes is too large for logging.
…rt on overused nodes
Good comments Xifan, thanks. |
Hi @vaughnbetz |
If there are only a few overused rr-nodes in a binary search it can be useful to print them out (they can occur repeatedly, indicating a routing pattern problem). But otherwise we should only print it on the final failed routing. So we could do:
|
@tangxifan How many overused nodes are you trying to analyze in your case? Also, if you only care about the final routing, I can generate the report only if the final routing fails, regardless of how many nodes/nets there are. |
@Bill-hbrhbr I would focus on the final routing and a full report of overused nodes only in that routing result. Normally, the titan benchmarks failed with less than 20 overused nodes. Therefore I think what you did is good enough. |
@tangxifan If you are mainly working with titan benchmarks, then I don’t think we need to worry too much about the binary search on minimum width_fac. I don’t recall seeing binary search being used on Titan. |
@Bill-hbrhbr I would suggest thinking beyond titan benchmarks. In general, we do not need any node overuse reports during the binary search. In my opinion, it is also easy to do so. Your report generation is called after routing is done, e.g., somewhere in the function |
Another modification you may consider is to keep your functions in a separate source file, as your functions have little dependency on the functions in route_timing etc. As such, you can call your function freely in any source file with a header file included. |
@tangxifan Ok thats indeed the easy way to do it. Thanks for the suggestion! |
… the final routing attempt has failed. Also fixed a bug where a single net counts a overused node for multiple times in the report
…ong ids). Also added new accessors for the rr_graph_storage interface
@tangxifan Hi Xifan, I think I've basically finished improving the PR according to your comments. If you need other changes, feel free to point them out. |
Hi @Bill-hbrhbr Thanks for the effort. It looks good for me. |
@vaughnbetz I've added documentation for the new command-line options (one for logging and one for the separate report). If you think they are a little minor for the developers' page, then I'll revert the last commit. Otherwise, this PR is ready to merge. Update: working on printing more detailed info on the nets. |
Hi @Bill-hbrhbr Sorry. I forgot to check the nets. Would you mind uploading an example report as well as the print-out in vpr.out log after the modification? It will be much more straightforward to check. You can pick an architecture and a benchmark and specify a fixed route channel width, which is slightly smaller than the minimum routable channel width. Thanks! |
@tangxifan This is the current format of the report file. It only serves as an example. I generated this report based on a failed attempt during one of the binary search routing iterations for diffeq2.v+flagship_arch, but normally this report would only be printed out if the final routing attempt still fails, and the --generate_rr_node_overuse_report option is on. |
@Bill-hbrhbr This looks almost perfect on my side. The only comment I have is to add the overuse id for each node and a total number of overused nodes to the report. |
report_overused_nodes.rpt.txt @tangxifan The info in two files match each other. The former is the updated report, and the latter is an excerpt from the vpr.out. (I used Notepad++ to open the latter one and it would display the format nicely.) |
@Bill-hbrhbr I am satisfied with the log file and report file. Just a question, did you set a threshold of 230 or more when generating the log file? I was wondering by default, it should only output 20 overused nodes. |
Yeah, I set it to 1000 for demonstration only. Default is 20. |
@Bill-hbrhbr I just double checked the codes. It seems that if we set the threshold to 20 which is smaller than the actual number of overused nodes, nothing will be reported than a warning. Should we still report the first 20 overused nodes? Otherwise, users have to re-run the flow and find the correct threshold? |
@tangxifan In the vpr.out log file, you can see that there is a table of routing status info (printed for each routing iteration/trial with a single channel width) above the overuse info table. The user will be able to tell how many nodes are left overused from the last line entry in that table (in this case, it is 229). Regarding printing overused nodes up to the threshold number, I can make that happen if you think it would be helpful. |
@Bill-hbrhbr I personally lean to be user-friendly. If a user set a threshold while the total is like 229, we just output the given number of overused nodes in the log and leave a message telling the user to check the report file for a complete information. Does that make sense to you? If it is o.k., can you also provide a log file in this test case? Thanks |
Ok. But the report will only be generated if the last routing attempt fails. In this case, this routing iteration produced 229 overused nodes, but it is not the final attempt from the binary search strategy, so the report wouldn't be generated. In this case, it would be misleading to leave a message and tell the users to use the report option to generate a report for the complete info. |
I think I'll just make the change to print the # of nodes towards the threshold and leave a message if the whole routing process fails and the report option is off. |
@Bill-hbrhbr I understand what you mean now. I thought your log file was showing the final routing attempt. It is reasonable not to show anything overused details and reports for non-final routing attempts. Would you mind showing me complete log files? I would like to see the two test cases:
If the log file matches our expectation, I think this pull request is in a good shape. |
1st case: VPR failed, report generation is off. Logfile: vpr_fail_no_report.out 2nd case: VPR failed, report generation is on. Logfile: vpr_fail_yes_report.out 3rd case: VPR succeeded. The report is not generated. Logfile: vpr_succeed.out Search Failed routing attempt # to quickly locate the overuse table in the logfiles above. The # of entries printed is the minimum between # of overused nodes and the threshold number, with special messages printed if the threshold is the smaller one. |
…the threshold number if the threshold is exceeded. Add reminder for the report generation option at the end of vpr.out logfile should the router fails to implement the circuit.
@Bill-hbrhbr I just checked the log files and reports. I am satisfied by your efforts. |
@tangxifan Thanks! Do you want it to be merged ASAP? I can do that after the travis-CI has been passed. |
I am fine with the codes actually. It depends on you and @vaughnbetz when it is a good time to merge. It will surely help my pull request. |
@tangxifan Ok. I think this PR can be brought up in the meeting tomorrow. All the checks should pass by then, and @vaughnbetz will get a chance to take a look at it. |
Thanks Bill and thanks for the detailed review Xifan. |
The reports look great. Xifan, if you can run with the detailed reporting on the rr-graph changes (with no arch file changes) on the 3 unroutable designs that nearly route and post the resulting reports that would be great. |
Description
Add the option --max_logged_overused_rr_nodes N, which is set to N=20 by default.
When each routing attempt fails, if the # of overused nodes is less than or equal to N, then print out the diagnostics of the overused nodes.
Add the option --generate_rr_node_overuse_report on/off, which generates a detailed report on overused RR nodes if the final routing attempt fails (i.e. the routing process has failed).
Related Issue
#1453
Types of changes
Checklist: