Print out info/report on overused rr-nodes at the end of a failed routing iteration #1455

Bill-hbrhbr · 2020-07-28T10:49:24Z

Description

Add the option --max_logged_overused_rr_nodes N, which is set to N=20 by default.
When each routing attempt fails, if the # of overused nodes is less than or equal to N, then print out the diagnostics of the overused nodes.

Add the option --generate_rr_node_overuse_report on/off, which generates a detailed report on overused RR nodes if the final routing attempt fails (i.e. the routing process has failed).

Related Issue

#1453

Types of changes

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

…odes info body

tangxifan

Hi @Bill-hbrhbr @vaughnbetz
I have gone through the codes. It looks clean to me but it requires some fine-tuning.
On the other sides, I am thinking about if we should also generate a report file for these overused nodes?

If the number of overused nodes is far more than the max value (default is 20), we have limited access to the information. Then we have to rerun routing to get more information, which could be very time consuming (think about the complexity of routing algorithms). Therefore, a report file may work well in this scenario.
Should we also report which nets caused the overuse on the nodes? If I remember correctly, the routing traces are stored whatever routing is successful or not. We can walk through the routing traces and find which nets are mapped to the same node. Not sure if this may help debug or not. I used this when debugging the routing results in packer.

vpr/src/route/route_timing.cpp

tangxifan · 2020-07-28T16:37:38Z

vpr/src/route/route_timing.cpp

@@ -1669,6 +1675,107 @@ static void print_route_status(int itry, double elapsed_sec, float pres_fac, int
    fflush(stdout);
 }

+static void print_overused_nodes_status(const t_router_opts& router_opts, const OveruseInfo& overuse_info) {
+    //Display upper limit
+    if (int(overuse_info.overused_nodes) > router_opts.max_reported_overused_rr_nodes) {


It seems that if the number of overused nodes is larger than the threshold, we will do nothing.
Should we still report the overuse status and leave a message that the listed nodes are limited by the thresholds set by users.

I think we'll log the overuse info if the number is small as well as produce an overuse info report file.

Thanks! That would be perfect.

I assume you want all the nets at the congested node as well? Do you want the report to be indexed by the congested nets or the overused RR nodes?

Yes. Showing all the congested nets will be helpful. I prefer to be indexed by the overused RR Nodes, where we can see what nets share the same the node.

I've added the nets to the report. You can checkout the report by specifying --generate_rr_node_overuse_report on. I will deal with the strong Id issue soon.

Adding the congested nets is a good idea. In addition to printing their ids, I think we should also print their names and the type of driving block. E.g.
net #83 (processor|alu|reset) driven by a block of type CLB
as that can help locate them in the netlist and see patterns faster.

Sorry, I forgot about printing more detailed info on the nets. Working on this right now

… Also print a message if the # of overused nodes is too large for logging.

…rt on overused nodes

vaughnbetz · 2020-07-28T21:20:23Z

Good comments Xifan, thanks.
I think we should not print the congested report when there are too many overused nodes though, as it will lead to a lot of huge files in a binary search for minimum channel width route.

tangxifan · 2020-07-28T21:26:26Z

Hi @vaughnbetz
Indeed, we should avoid printing reports after each try during the binary search phase. Should we just generate a report at the end of routing phase? For instance, when the final routing of binary search failed, we can have a report for the final try? In most cases, the binary search will end with success unless the rr_graph is not routable.

vaughnbetz · 2020-07-28T21:33:02Z

If there are only a few overused rr-nodes in a binary search it can be useful to print them out (they can occur repeatedly, indicating a routing pattern problem). But otherwise we should only print it on the final failed routing.

So we could do:

always print to the log file if the overused rr-nodes are small enough
print the separate report only on the final (binary search or regular) routing, if it fails.

Bill-hbrhbr · 2020-07-28T22:38:28Z

@tangxifan How many overused nodes are you trying to analyze in your case? Also, if you only care about the final routing, I can generate the report only if the final routing fails, regardless of how many nodes/nets there are.

tangxifan · 2020-07-28T22:42:38Z

@Bill-hbrhbr I would focus on the final routing and a full report of overused nodes only in that routing result. Normally, the titan benchmarks failed with less than 20 overused nodes. Therefore I think what you did is good enough.

Bill-hbrhbr · 2020-07-28T22:51:08Z

@tangxifan If you are mainly working with titan benchmarks, then I don’t think we need to worry too much about the binary search on minimum width_fac. I don’t recall seeing binary search being used on Titan.
Trying to add detection for the last failed routing attempt would require more drastic code refactoring.

tangxifan · 2020-07-28T23:04:20Z

@Bill-hbrhbr I would suggest thinking beyond titan benchmarks. In general, we do not need any node overuse reports during the binary search. In my opinion, it is also easy to do so. Your report generation is called after routing is done, e.g., somewhere in the function vpr_route_flow(), where you just do reporting when routing status flags a failure. By doing so, you can skip the binary search part.

tangxifan · 2020-07-28T23:08:10Z

Another modification you may consider is to keep your functions in a separate source file, as your functions have little dependency on the functions in route_timing etc. As such, you can call your function freely in any source file with a header file included.

Bill-hbrhbr · 2020-07-28T23:08:22Z

@tangxifan Ok thats indeed the easy way to do it. Thanks for the suggestion!

… the final routing attempt has failed. Also fixed a bug where a single net counts a overused node for multiple times in the report

…ong ids). Also added new accessors for the rr_graph_storage interface

Bill-hbrhbr · 2020-07-29T19:11:21Z

@tangxifan Hi Xifan, I think I've basically finished improving the PR according to your comments. If you need other changes, feel free to point them out.

tangxifan · 2020-07-29T19:54:45Z

Hi @Bill-hbrhbr Thanks for the effort. It looks good for me.

Bill-hbrhbr · 2020-07-29T20:30:25Z

@vaughnbetz I've added documentation for the new command-line options (one for logging and one for the separate report). If you think they are a little minor for the developers' page, then I'll revert the last commit. Otherwise, this PR is ready to merge.

Update: working on printing more detailed info on the nets.

tangxifan · 2020-07-29T20:55:31Z

Hi @Bill-hbrhbr Sorry. I forgot to check the nets. Would you mind uploading an example report as well as the print-out in vpr.out log after the modification? It will be much more straightforward to check. You can pick an architecture and a benchmark and specify a fixed route channel width, which is slightly smaller than the minimum routable channel width. Thanks!

Bill-hbrhbr · 2020-07-29T21:49:27Z

report_overused_nodes.rpt.txt

@tangxifan This is the current format of the report file. It only serves as an example. I generated this report based on a failed attempt during one of the binary search routing iterations for diffeq2.v+flagship_arch, but normally this report would only be printed out if the final routing attempt still fails, and the --generate_rr_node_overuse_report option is on.

tangxifan · 2020-07-29T21:55:53Z

@Bill-hbrhbr This looks almost perfect on my side. The only comment I have is to add the overuse id for each node and a total number of overused nodes to the report.

Bill-hbrhbr · 2020-07-29T22:15:32Z

report_overused_nodes.rpt.txt
example_logging.txt

@tangxifan The info in two files match each other. The former is the updated report, and the latter is an excerpt from the vpr.out. (I used Notepad++ to open the latter one and it would display the format nicely.)

tangxifan · 2020-07-29T22:20:45Z

@Bill-hbrhbr I am satisfied with the log file and report file. Just a question, did you set a threshold of 230 or more when generating the log file? I was wondering by default, it should only output 20 overused nodes.

Bill-hbrhbr · 2020-07-29T22:21:48Z

Yeah, I set it to 1000 for demonstration only. Default is 20.

tangxifan · 2020-07-29T22:26:35Z

@Bill-hbrhbr I just double checked the codes. It seems that if we set the threshold to 20 which is smaller than the actual number of overused nodes, nothing will be reported than a warning. Should we still report the first 20 overused nodes? Otherwise, users have to re-run the flow and find the correct threshold?

Bill-hbrhbr · 2020-07-29T22:32:14Z

@tangxifan In the vpr.out log file, you can see that there is a table of routing status info (printed for each routing iteration/trial with a single channel width) above the overuse info table. The user will be able to tell how many nodes are left overused from the last line entry in that table (in this case, it is 229).

Regarding printing overused nodes up to the threshold number, I can make that happen if you think it would be helpful.

tangxifan · 2020-07-29T22:40:21Z

@Bill-hbrhbr I personally lean to be user-friendly. If a user set a threshold while the total is like 229, we just output the given number of overused nodes in the log and leave a message telling the user to check the report file for a complete information. Does that make sense to you? If it is o.k., can you also provide a log file in this test case? Thanks

Bill-hbrhbr · 2020-07-29T22:45:44Z

Ok. But the report will only be generated if the last routing attempt fails. In this case, this routing iteration produced 229 overused nodes, but it is not the final attempt from the binary search strategy, so the report wouldn't be generated.

In this case, it would be misleading to leave a message and tell the users to use the report option to generate a report for the complete info.

Bill-hbrhbr · 2020-07-29T22:53:18Z

I think I'll just make the change to print the # of nodes towards the threshold and leave a message if the whole routing process fails and the report option is off.

tangxifan · 2020-07-29T22:57:30Z

@Bill-hbrhbr I understand what you mean now. I thought your log file was showing the final routing attempt. It is reasonable not to show anything overused details and reports for non-final routing attempts.
I also agree with you that if user does not require reporting, we just leave a short message.

Would you mind showing me complete log files? I would like to see the two test cases:

VPR succeeds in final routing attempt. We see a short message as you show in the previous log file during the intermediate routing attempts and no report file will be generated.
VPR fails in the final routing attempt. We will see the overused nodes printed out and there is a report file created as well.

If the log file matches our expectation, I think this pull request is in a good shape.

Bill-hbrhbr · 2020-07-29T23:55:14Z

@tangxifan

1st case: VPR failed, report generation is off.
Command (from the VTR directory): ./vtr_flow/scripts/run_vtr_flow.pl ./vtr_flow/benchmarks/verilog/diffeq2.v ./vtr_flow/arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml -crit_path_router_iterations 100 --max_logged_overused_rr_nodes 50 --route_chan_width 32

Logfile: vpr_fail_no_report.out
Report: None
Near the end of the logfile, there is a reminder for the report generation option. I fixed the channel width so that the routing will fail (no binary search).

2nd case: VPR failed, report generation is on.
Command: ./vtr_flow/scripts/run_vtr_flow.pl ./vtr_flow/benchmarks/verilog/diffeq2.v ./vtr_flow/arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml -crit_path_router_iterations 100 --max_logged_overused_rr_nodes 50 --route_chan_width 32 --generate_rr_node_overuse_report on

Logfile: vpr_fail_yes_report.out
Report: report_overused_nodes.rpt
Near the end of the logfile, there is a reminder for the user to check out the report.

3rd case: VPR succeeded. The report is not generated.
Command: ./vtr_flow/scripts/run_vtr_flow.pl ./vtr_flow/benchmarks/verilog/diffeq2.v ./vtr_flow/arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml -crit_path_router_iterations 100 --max_logged_overused_rr_nodes 50 --generate_rr_node_overuse_report on

Logfile: vpr_succeed.out
Report: None
No fixing channel width, so the binary search takes place, which makes VPR succeed.

Search Failed routing attempt # to quickly locate the overuse table in the logfiles above. The # of entries printed is the minimum between # of overused nodes and the threshold number, with special messages printed if the threshold is the smaller one.

…the threshold number if the threshold is exceeded. Add reminder for the report generation option at the end of vpr.out logfile should the router fails to implement the circuit.

tangxifan · 2020-07-30T00:03:25Z

@Bill-hbrhbr I just checked the log files and reports. I am satisfied by your efforts.

Bill-hbrhbr · 2020-07-30T00:04:35Z

@tangxifan Thanks! Do you want it to be merged ASAP? I can do that after the travis-CI has been passed.

tangxifan · 2020-07-30T00:13:07Z

I am fine with the codes actually. It depends on you and @vaughnbetz when it is a good time to merge. It will surely help my pull request.

Bill-hbrhbr · 2020-07-30T00:15:33Z

@tangxifan Ok. I think this PR can be brought up in the meeting tomorrow. All the checks should pass by then, and @vaughnbetz will get a chance to take a look at it.

vaughnbetz · 2020-07-30T20:04:22Z

Thanks Bill and thanks for the detailed review Xifan.

vaughnbetz · 2020-07-30T20:09:13Z

The reports look great. Xifan, if you can run with the detailed reporting on the rr-graph changes (with no arch file changes) on the 3 unroutable designs that nearly route and post the resulting reports that would be great.

Bill-hbrhbr added 3 commits July 28, 2020 04:21

Print overused info if routing fails. Formatted header and overused n…

add9833

…odes info body

Minor fix to display format

1a3f77e

Added option to set the limit for max reported overused RR nodes

ef508c7

probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Jul 28, 2020

Bill-hbrhbr requested review from vaughnbetz and tangxifan July 28, 2020 10:50

tangxifan reviewed Jul 28, 2020

View reviewed changes

Bill-hbrhbr added 3 commits July 28, 2020 14:39

Changed the command line option to --generate_rr_node_overuse_report.…

f1b134c

… Also print a message if the # of overused nodes is too large for logging.

Added option --generate_rr_node_overuse_report on for a detailed repo…

7c095ac

…rt on overused nodes

Added congested nets to the overuse report

7d4b338

Bill-hbrhbr added 3 commits July 29, 2020 13:49

Moved overuse info routines to separate files

b15edcf

Changed the overused nodes report generation to take effect only when…

442203e

… the final routing attempt has failed. Also fixed a bug where a single net counts a overused node for multiple times in the report

Changed indexing of the report functions from size_t to RRNodeId (str…

c20257a

…ong ids). Also added new accessors for the rr_graph_storage interface

Added documentation for the two new command line options

5fc5c27

probot-autolabeler bot added the docs Documentation label Jul 29, 2020

Added detailed congested net info to the overused RR node report

db47f5d

Improved the logging file: will display overused nodes entries up to …

0844f65

…the threshold number if the threshold is exceeded. Add reminder for the report generation option at the end of vpr.out logfile should the router fails to implement the circuit.

Update documentation according to the latest change to the logfile

4204c90

Bill-hbrhbr merged commit 6b8fdbe into verilog-to-routing:master Jul 30, 2020

Bill-hbrhbr self-assigned this Aug 26, 2020

Print out info/report on overused rr-nodes at the end of a failed routing iteration #1455

Print out info/report on overused rr-nodes at the end of a failed routing iteration #1455

Uh oh!

Conversation

Bill-hbrhbr commented Jul 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Types of changes

Checklist:

Uh oh!

tangxifan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vaughnbetz commented Jul 28, 2020

Uh oh!

tangxifan commented Jul 28, 2020

Uh oh!

vaughnbetz commented Jul 28, 2020

Uh oh!

Bill-hbrhbr commented Jul 28, 2020

Uh oh!

tangxifan commented Jul 28, 2020

Uh oh!

Bill-hbrhbr commented Jul 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tangxifan commented Jul 28, 2020

Uh oh!

tangxifan commented Jul 28, 2020

Uh oh!

Bill-hbrhbr commented Jul 28, 2020

Uh oh!

Bill-hbrhbr commented Jul 29, 2020

Uh oh!

tangxifan commented Jul 29, 2020

Uh oh!

Bill-hbrhbr commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tangxifan commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bill-hbrhbr commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tangxifan commented Jul 29, 2020

Uh oh!

Bill-hbrhbr commented Jul 29, 2020

Uh oh!

tangxifan commented Jul 29, 2020

Uh oh!

Bill-hbrhbr commented Jul 29, 2020

Uh oh!

tangxifan commented Jul 29, 2020

Uh oh!

Bill-hbrhbr commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tangxifan commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bill-hbrhbr commented Jul 28, 2020 •

edited

Loading

tangxifan left a comment •

edited

Loading

Bill-hbrhbr commented Jul 28, 2020 •

edited

Loading

Bill-hbrhbr commented Jul 29, 2020 •

edited

Loading

tangxifan commented Jul 29, 2020 •

edited

Loading

Bill-hbrhbr commented Jul 29, 2020 •

edited

Loading

Bill-hbrhbr commented Jul 29, 2020 •

edited

Loading

tangxifan commented Jul 29, 2020 •

edited

Loading

Bill-hbrhbr commented Jul 29, 2020 •

edited

Loading

Bill-hbrhbr commented Jul 30, 2020 •

edited

Loading