-
Notifications
You must be signed in to change notification settings - Fork 415
[Router] Enhanced the Connection Router by Optimizing the Heap Structure #2720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Router] Enhanced the Connection Router by Optimizing the Heap Structure #2720
Conversation
- The heap node structure in the connection router is simplified by only maintaining the node index (int32) and node total cost (fp32). - Eliminated dynamic memory allocation for the heap node structure. - The above optimization leads to an algorithmic change for path search; specifically, we need to update node states right before pushing and modify the prune functions. The RCV feature also works when using `--routing_budgets_algorithm yoyo` for the VPR launch.
Simplified the RCV `path_data` pointer in `t_heap` data structure by replacing it with a float value, which is the only variable used after `t_heap` is returned following a successful connection route.
The `R_upstream` field added to the `t_rr_node_route_inf` structure for convenience in commit b11b2 is removed and kept inside the connection router class instead. This allows the size of each `t_rr_node_route_inf` to be as minimal as possible, which tends to be more cache-friendly when accessing the large RR node route info data, leading to higher performance.
- Changed APIs in the `HeapInterface` class. - Removed the `HeapStorage` class (used for dynamic memory allocations). - Removed the deprecated bucket heap implementation. - Replaced the prior k-ary heap implementation with the enhanced d-ary heap introduced in commit b11b2e.
The `t_heap` data structure is no longer needed as the "heap node structure" due to the algorithmic changes in the connection router in commit b11b2eb. However, `t_heap` is used in many places as a holder for the return values of the route routines. This commit refactored the `t_heap`, renamed it to `RTExploredNode`, and combined it with the other similar node info data structure, `node_t`, in the connection router to fit the changes since commit b11b2eb.
The option to turn on the YOYO routing budgets algorithm (i.e., using RCV) was previously missing in both the VPR CLI and the documents, and this commit fixed the issues. Additionally, this commit added regression tests for the YOYO option.
Strange CI QoR failures that seem to indicate the critical path routing time isn't parsed properly. There is an extra . at the start of the line in the output -- try removing that (although it shouldn't cause this, something is!). |
@vaughnbetz : need to review. |
If you have the energy to create an RCV test:
|
Fixed the output of the SSSP time of the connection router, which caused VTR flow log parser issues and made CI mad. Reverted the changes for RCV tests, which are not working and appear to be too simple to be useful. It needs more time and effort to make a comprehensive set of regression tests for the RCV technique. We tend to postpone this task since it is not the main purpose of this PR.
Replaced the pointer (dynamic memory allocations) of the priority queue in d_ary_heap.h with the priority queue object in order to make the coarse-grained parallel netlist router happy. Previously, when using pointers, as in commit 084fb54, the parallel netlist router would always throw weird double-free errors. After debugging with GDB and Valgrind, it appeared that the issues came from the Intel OneTBB deallocation process. However, there is no convincing evidence of the problems.
The weird double-free errors I encounter during the parallel netlist router regression tests:When running the strong regression tests on 084fb54, all tests/tasks using
If I temporarily delete
The workaround to fix it:The way I made this PR compatible with parallel netlist router is by replacing the pointer (to heap) used inside DAryHeap with an object; no changes were made to the parallel router. The details are on 599e06e. It doesn't make sense to me (since the previous connection router and the heap it used actively use pointers and dynamic memory allocations and no harm to the parallel router), but it works. @duck2 Hi, Fahrican. Did you encounter this before? |
Updated the golden results of the following three regression tests for PR verilog-to-routing#2720: - vtr_reg_strong/strong_bidir - vtr_reg_strong/strong_place_delay_model - vtr_reg_nightly_test1/arithmetic_tasks/multless_consts
Updated the golden results of the following three regression tests for PR verilog-to-routing#2720: - vtr_reg_strong/strong_bidir - vtr_reg_strong/strong_place_delay_model - vtr_reg_nightly_test1/arithmetic_tasks/multless_consts
Interesting... should check |
Looks like having a pointer in there could break vtr-verilog-to-routing/vpr/src/route/ParallelNetlistRouter.h Lines 36 to 38 in 3e53a93
It's possible the copy didn't work with the pointer and the refactored code? but it makes sense it would work with an object, I guess that fixes the default constructor or whatever TBB uses to copy the ConnectionRouter |
@duck2 Hey Fahrican, thank you for your explanation! It totally makes sense. I have confirmed that the
The proof of concept is here ueqri/oneTBB@78fcb05. I added a pointer to a class used by git clone https://github.com/ueqri/oneTBB.git && cd oneTBB
git checkout verify-enumerable-thread-specific-issues
cd examples/parallel_for/polygon_overlay
mkdir build && cd build
cmake .. && make light_test_polygon_overlay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good @ueqri ! A few commenting and code cleanup suggestions.
Resolved the comments in the PR code reviews, mostly addressing the code comments and coding style issues. Fixed the RCV pre-push prune condition such that when RCV is on, the connection router prunes based on the RCV-specific total path cost to allow detours in order to achieve better QoR.
@vaughnbetz Hi Vaughn, the code has been cleaned according to your suggestions and is ready for review again! Before merging this PR, there are still a few things left for me to be done. For the new result table, I am still waiting for my local run to be finished. I will update the result table when it is done. Also, I will wait for the CI to complete to see if it is needed to update the golden results. |
Sounds good, thanks. It doesn't gate this PR, but as time permits it would be good to make an RCV test and try the two different pruning conditions. |
Added Vaughn's detailed comments explaining the RCV technique and how the post-pop prune function in the connection router works for RCV.
After changing the API of `evaluate_timing_driven_node_costs`, there was a typo (or bug) left inside an ifdef block. This commit fixed the bug and changed the typo to the correct variable name.
`strong_place_delay_calc_method` and `strong_place_delay_model` tests had QoR failures after rebasing to master at commit 5925e69. This commit updated the corresponding golden results to make the regression tests happy.
The only failure is a spurious assert that @AlexandreSinger just fixed. Updating the PR so CI re-runs; it should pass this time (except nightly*, which are down). All my code comments have been addressed. @duck2 : this PR is a significant update to the core connection router and heap. It may conflict some with your parallel changes (although not in any fundamental way I don't think). It removes dynamic memory allocation of the items placed on the heap, so that may simplify your life / improve things. It also speeds up the path search by ~1.4x (route time by ~1.24x), which may cut your parallel speedups a bit. We should discuss what order to merge these in during the VTR meeting today. Since this one is ready to go I think I should merge it very soon, but wanted to give you a heads up and get your thoughts. |
Yes. The QoR results in the #2720 (comment) are the latest. (The previous data is also kept in the comment revision history.) I am currently re-running all the nightly tests locally, and found (unrouteable) issues causing the tests to fail. (I think these issues were introduced at some point before when I rebased to master; all nightly tests did pass at c360f71). I am verifying these nightly test failures and will update here soon. |
Thanks Hang. If you can add in log files with the failures we could potentially discuss / brainstorm them in the VTR meeting. I'll forward the invite in case you want to attend (at 1 pm today). |
Thanks, Vaughn! Discussing in VTR meeting will be really helpful. Notes of the nightly test failures (except these, all other nightly tests, including
Issues:
|
The bug was introduced by me in commit 2966d66; I merged two adjacent if-bodies with conditions that appeared the same, but were in fact not. A simplified example is: `if (A == 0) { ... A might be modified ... } if (A == 0) { return ; }` Merging the if-bodies would be problematic if A is modified in the first if-body and is no longer 0. This mistake caused tons of nightly test failures, as mentioned in the comments in PR#2720. So glad that the nightly tests CAUGHT this bug, which was not detected by the VTR strong regression tests.
Let's talk about this today in our meeting @ueqri as I'd like to get it merged sooner rather than later. If you've found a bug and fixed it, I suggest re-launching the nightly tests to start gathering data on them if you haven't already. |
No problem! I have fixed the bug at commit d9f18cd. The re-launched nightly tests are almost done; I will summarize the results in our meeting today (and paste here afterwards). The conflicts with Amin's renaming has been resolved. |
If you can paste the nightly test results in here that would be great @ueqri . |
Hi Vaughn. The summarized results (including the links to the logs) have been updated here. I ran a clean nightly test on wintermute after this PR was merged. The logs and the fixes (applying your suggestions) for the test failures are included in the new PR #2785. |
The RCV feature also works when using
--routing_budgets_algorithm yoyo
for the VPR launch.Where does the Speedup Come from
Summary of the Results
Result Tables
The baseline is built on commit 21d01. Both implementations use 4-ary heap for path search in the connection router. The following results are collected on the wintemute server.
RCV feature disabled (using default CLI settings for both runs)
RCV feature enabled
Use
--routing_budgets_algorithm yoyo
to enable RCV feature. All others CLI settings are set by default.