Skip to content

Print out info/report on overused rr-nodes at the end of a failed routing iteration #1455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 30, 2020

Conversation

Bill-hbrhbr
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Jul 28, 2020

Description

Add the option --max_logged_overused_rr_nodes N, which is set to N=20 by default.
When each routing attempt fails, if the # of overused nodes is less than or equal to N, then print out the diagnostics of the overused nodes.

Add the option --generate_rr_node_overuse_report on/off, which generates a detailed report on overused RR nodes if the final routing attempt fails (i.e. the routing process has failed).

Related Issue

#1453

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@probot-autolabeler probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Jul 28, 2020
Copy link
Contributor

@tangxifan tangxifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Bill-hbrhbr @vaughnbetz
I have gone through the codes. It looks clean to me but it requires some fine-tuning.
On the other sides, I am thinking about if we should also generate a report file for these overused nodes?

  1. If the number of overused nodes is far more than the max value (default is 20), we have limited access to the information. Then we have to rerun routing to get more information, which could be very time consuming (think about the complexity of routing algorithms). Therefore, a report file may work well in this scenario.
  2. Should we also report which nets caused the overuse on the nodes? If I remember correctly, the routing traces are stored whatever routing is successful or not. We can walk through the routing traces and find which nets are mapped to the same node. Not sure if this may help debug or not. I used this when debugging the routing results in packer.

@@ -1669,6 +1675,107 @@ static void print_route_status(int itry, double elapsed_sec, float pres_fac, int
fflush(stdout);
}

static void print_overused_nodes_status(const t_router_opts& router_opts, const OveruseInfo& overuse_info) {
//Display upper limit
if (int(overuse_info.overused_nodes) > router_opts.max_reported_overused_rr_nodes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that if the number of overused nodes is larger than the threshold, we will do nothing.
Should we still report the overuse status and leave a message that the listed nodes are limited by the thresholds set by users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll log the overuse info if the number is small as well as produce an overuse info report file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That would be perfect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you want all the nets at the congested node as well? Do you want the report to be indexed by the congested nets or the overused RR nodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Showing all the congested nets will be helpful. I prefer to be indexed by the overused RR Nodes, where we can see what nets share the same the node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the nets to the report. You can checkout the report by specifying --generate_rr_node_overuse_report on. I will deal with the strong Id issue soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the congested nets is a good idea. In addition to printing their ids, I think we should also print their names and the type of driving block. E.g.
net #83 (processor|alu|reset) driven by a block of type CLB
as that can help locate them in the netlist and see patterns faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot about printing more detailed info on the nets. Working on this right now

@vaughnbetz
Copy link
Contributor

Good comments Xifan, thanks.
I think we should not print the congested report when there are too many overused nodes though, as it will lead to a lot of huge files in a binary search for minimum channel width route.

@tangxifan
Copy link
Contributor

Hi @vaughnbetz
Indeed, we should avoid printing reports after each try during the binary search phase. Should we just generate a report at the end of routing phase? For instance, when the final routing of binary search failed, we can have a report for the final try? In most cases, the binary search will end with success unless the rr_graph is not routable.

@vaughnbetz
Copy link
Contributor

If there are only a few overused rr-nodes in a binary search it can be useful to print them out (they can occur repeatedly, indicating a routing pattern problem). But otherwise we should only print it on the final failed routing.

So we could do:

  • always print to the log file if the overused rr-nodes are small enough
  • print the separate report only on the final (binary search or regular) routing, if it fails.

@Bill-hbrhbr
Copy link
Contributor Author

@tangxifan How many overused nodes are you trying to analyze in your case? Also, if you only care about the final routing, I can generate the report only if the final routing fails, regardless of how many nodes/nets there are.

@tangxifan
Copy link
Contributor

@Bill-hbrhbr I would focus on the final routing and a full report of overused nodes only in that routing result. Normally, the titan benchmarks failed with less than 20 overused nodes. Therefore I think what you did is good enough.

@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Jul 28, 2020

@tangxifan If you are mainly working with titan benchmarks, then I don’t think we need to worry too much about the binary search on minimum width_fac. I don’t recall seeing binary search being used on Titan.
Trying to add detection for the last failed routing attempt would require more drastic code refactoring.

@tangxifan
Copy link
Contributor

@Bill-hbrhbr I would suggest thinking beyond titan benchmarks. In general, we do not need any node overuse reports during the binary search. In my opinion, it is also easy to do so. Your report generation is called after routing is done, e.g., somewhere in the function vpr_route_flow(), where you just do reporting when routing status flags a failure. By doing so, you can skip the binary search part.

@tangxifan
Copy link
Contributor

Another modification you may consider is to keep your functions in a separate source file, as your functions have little dependency on the functions in route_timing etc. As such, you can call your function freely in any source file with a header file included.

@Bill-hbrhbr
Copy link
Contributor Author

@tangxifan Ok thats indeed the easy way to do it. Thanks for the suggestion!

… the final routing attempt has failed. Also fixed a bug where a single net counts a overused node for multiple times in the report
…ong ids). Also added new accessors for the rr_graph_storage interface
@Bill-hbrhbr
Copy link
Contributor Author

@tangxifan Hi Xifan, I think I've basically finished improving the PR according to your comments. If you need other changes, feel free to point them out.

@probot-autolabeler probot-autolabeler bot added the docs Documentation label Jul 29, 2020
@tangxifan
Copy link
Contributor

Hi @Bill-hbrhbr Thanks for the effort. It looks good for me.

@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Jul 29, 2020

@vaughnbetz I've added documentation for the new command-line options (one for logging and one for the separate report). If you think they are a little minor for the developers' page, then I'll revert the last commit. Otherwise, this PR is ready to merge.

Update: working on printing more detailed info on the nets.

@tangxifan
Copy link
Contributor

tangxifan commented Jul 29, 2020

Hi @Bill-hbrhbr Sorry. I forgot to check the nets. Would you mind uploading an example report as well as the print-out in vpr.out log after the modification? It will be much more straightforward to check. You can pick an architecture and a benchmark and specify a fixed route channel width, which is slightly smaller than the minimum routable channel width. Thanks!

@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Jul 29, 2020

report_overused_nodes.rpt.txt

@tangxifan This is the current format of the report file. It only serves as an example. I generated this report based on a failed attempt during one of the binary search routing iterations for diffeq2.v+flagship_arch, but normally this report would only be printed out if the final routing attempt still fails, and the --generate_rr_node_overuse_report option is on.

@tangxifan
Copy link
Contributor

@Bill-hbrhbr This looks almost perfect on my side. The only comment I have is to add the overuse id for each node and a total number of overused nodes to the report.

@Bill-hbrhbr
Copy link
Contributor Author

report_overused_nodes.rpt.txt
example_logging.txt

@tangxifan The info in two files match each other. The former is the updated report, and the latter is an excerpt from the vpr.out. (I used Notepad++ to open the latter one and it would display the format nicely.)

@tangxifan
Copy link
Contributor

@Bill-hbrhbr I am satisfied with the log file and report file. Just a question, did you set a threshold of 230 or more when generating the log file? I was wondering by default, it should only output 20 overused nodes.

@Bill-hbrhbr
Copy link
Contributor Author

Yeah, I set it to 1000 for demonstration only. Default is 20.

@tangxifan
Copy link
Contributor

@Bill-hbrhbr I just double checked the codes. It seems that if we set the threshold to 20 which is smaller than the actual number of overused nodes, nothing will be reported than a warning. Should we still report the first 20 overused nodes? Otherwise, users have to re-run the flow and find the correct threshold?

@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Jul 29, 2020

@tangxifan In the vpr.out log file, you can see that there is a table of routing status info (printed for each routing iteration/trial with a single channel width) above the overuse info table. The user will be able to tell how many nodes are left overused from the last line entry in that table (in this case, it is 229).

Regarding printing overused nodes up to the threshold number, I can make that happen if you think it would be helpful.

@tangxifan
Copy link
Contributor

tangxifan commented Jul 29, 2020

@Bill-hbrhbr I personally lean to be user-friendly. If a user set a threshold while the total is like 229, we just output the given number of overused nodes in the log and leave a message telling the user to check the report file for a complete information. Does that make sense to you? If it is o.k., can you also provide a log file in this test case? Thanks

@Bill-hbrhbr
Copy link
Contributor Author

Ok. But the report will only be generated if the last routing attempt fails. In this case, this routing iteration produced 229 overused nodes, but it is not the final attempt from the binary search strategy, so the report wouldn't be generated.

In this case, it would be misleading to leave a message and tell the users to use the report option to generate a report for the complete info.

@Bill-hbrhbr
Copy link
Contributor Author

I think I'll just make the change to print the # of nodes towards the threshold and leave a message if the whole routing process fails and the report option is off.

@tangxifan
Copy link
Contributor

@Bill-hbrhbr I understand what you mean now. I thought your log file was showing the final routing attempt. It is reasonable not to show anything overused details and reports for non-final routing attempts.
I also agree with you that if user does not require reporting, we just leave a short message.

Would you mind showing me complete log files? I would like to see the two test cases:

  • VPR succeeds in final routing attempt. We see a short message as you show in the previous log file during the intermediate routing attempts and no report file will be generated.
  • VPR fails in the final routing attempt. We will see the overused nodes printed out and there is a report file created as well.

If the log file matches our expectation, I think this pull request is in a good shape.

@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Jul 29, 2020

@tangxifan

1st case: VPR failed, report generation is off.
Command (from the VTR directory): ./vtr_flow/scripts/run_vtr_flow.pl ./vtr_flow/benchmarks/verilog/diffeq2.v ./vtr_flow/arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml -crit_path_router_iterations 100 --max_logged_overused_rr_nodes 50 --route_chan_width 32

Logfile: vpr_fail_no_report.out
Report: None
Near the end of the logfile, there is a reminder for the report generation option. I fixed the channel width so that the routing will fail (no binary search).

2nd case: VPR failed, report generation is on.
Command: ./vtr_flow/scripts/run_vtr_flow.pl ./vtr_flow/benchmarks/verilog/diffeq2.v ./vtr_flow/arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml -crit_path_router_iterations 100 --max_logged_overused_rr_nodes 50 --route_chan_width 32 --generate_rr_node_overuse_report on

Logfile: vpr_fail_yes_report.out
Report: report_overused_nodes.rpt
Near the end of the logfile, there is a reminder for the user to check out the report.

3rd case: VPR succeeded. The report is not generated.
Command: ./vtr_flow/scripts/run_vtr_flow.pl ./vtr_flow/benchmarks/verilog/diffeq2.v ./vtr_flow/arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml -crit_path_router_iterations 100 --max_logged_overused_rr_nodes 50 --generate_rr_node_overuse_report on

Logfile: vpr_succeed.out
Report: None
No fixing channel width, so the binary search takes place, which makes VPR succeed.

Search Failed routing attempt # to quickly locate the overuse table in the logfiles above. The # of entries printed is the minimum between # of overused nodes and the threshold number, with special messages printed if the threshold is the smaller one.

…the threshold number if the threshold is exceeded. Add reminder for the report generation option at the end of vpr.out logfile should the router fails to implement the circuit.
@tangxifan
Copy link
Contributor

@Bill-hbrhbr I just checked the log files and reports. I am satisfied by your efforts.

@Bill-hbrhbr
Copy link
Contributor Author

@tangxifan Thanks! Do you want it to be merged ASAP? I can do that after the travis-CI has been passed.

@tangxifan
Copy link
Contributor

I am fine with the codes actually. It depends on you and @vaughnbetz when it is a good time to merge. It will surely help my pull request.

@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Jul 30, 2020

@tangxifan Ok. I think this PR can be brought up in the meeting tomorrow. All the checks should pass by then, and @vaughnbetz will get a chance to take a look at it.

@Bill-hbrhbr Bill-hbrhbr merged commit 6b8fdbe into verilog-to-routing:master Jul 30, 2020
@vaughnbetz
Copy link
Contributor

Thanks Bill and thanks for the detailed review Xifan.

@vaughnbetz
Copy link
Contributor

The reports look great. Xifan, if you can run with the detailed reporting on the rr-graph changes (with no arch file changes) on the 3 unroutable designs that nearly route and post the resulting reports that would be great.

@Bill-hbrhbr Bill-hbrhbr self-assigned this Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants