Skip to content

Update BinaryHeap to FourAryHeap #2627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jun 28, 2024

Conversation

nedsels
Copy link
Contributor

@nedsels nedsels commented Jun 21, 2024

A new FourAryHeap has been added, and this has been made the default VPR heap implementation.

Description

A new abstract class KAryHeap has been created. This class is like the previous BinaryHeap, except it doesn't implement is_valid(), get_heap_head(), or sift_down(). Also, a new struct heap_elem has been added, which contains a t_heap pointer and a float cost for a certain node. This is now what the heap_ vector contains, instead of just t_heap pointers. In KAryHeap's methods, instead of dereferencing a t_heap pointer to get its cost for node comparisons, it is stored in heap_elem.

BinaryHeap and FourAryHeap now inherit from KAryHeap, and implement its pure virtual methods (all other methods are not specific to either of these two heap types). FOUR_ARY_HEAP has been added to the e_heap_type enum, and has been made the default heap implementation.

Motivation and Context

  1. Previously, BinaryHeap's heap_ vector contains t_heap pointers for each node in the heap. To get a node's cost, the respective pointer would be dereferenced, which increase runtime since this would occur whenever there was a comparison between two nodes. Now, instead of heap_ being made of these pointers, it is of a struct heap_elem which contains both the pointer and the node's cost. This cost is attained by dereferencing the pointer once in add_to_heap(), eliminating the need for dereferencing in future comparisons.
  2. This heap_elem is 12 bytes (pointer and float), which means 5 can fit on a 64-byte cache line; so, using a 4-ary heap instead of a binary heap is more cache-friendly, and improves runtime even with more comparisons (between 4 nodes vs 2).

For the future: if, instead of using t_heap pointers, a 4-byte ID can be used, heap_elem can be reduced to 8 bytes. Then, an 8-ary heap might be even better, even with many more comparisons. An 8-ary heap is fastest when using t_heap pointers in heap_ (8 can fit on a cache line) and this heap implementation as VPR's heap.

How Has This Been Tested?

Three changes were considered:

  1. Using heap_elem instead of t_heap pointers in the heap_ vector to eliminate unnecessary dereferencing
  2. Using a 4-ary heap instead of a binary heap
  3. Using memory alignment (8 bytes for when change (1.) is off, 16 bytes for when (1.) is on)

The 8 combinations of these changes were tested on the koios_medium benchmarks, and some combinations on titan benchmarks, on desktop. For smaller circuits, the results are inconsistent, but on titan, the both changes (1.) and (2.) result in runtime improvements. Change (3.) seems to have a negligible effect on runtime. Using changes (1.) and (2.) together results in a 4.2% reduction in runtime.

On titan benchmarks (all values are geomeans, averaged across 8 runs):

Total Routing Runtime (s) Runtime Improvement CPD (ns) Wirelength Memory Footprint (max_rss)
Old VPR Heap 186.8157 0.00% 17.1627 3200614 5501.6439
New VPR Heap 178.8911 4.24% 17.1809 3201187 5503.7495

Full Results

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

Nathan Shreve and others added 16 commits May 23, 2024 18:09
…ich contains node cost to prevent unnecessary dereferecing, and turn on/off alignment for that struct
@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Jun 21, 2024
@nedsels nedsels marked this pull request as ready for review June 21, 2024 19:41
@nedsels nedsels requested a review from vaughnbetz June 21, 2024 19:43
@nedsels
Copy link
Contributor Author

nedsels commented Jun 21, 2024

@vaughnbetz I'm wondering about two things:

  1. Is it worth adding regression tests that test both BinaryHeap and FourAryHeap?
  2. There are separate .h and .cpp files for BinaryHeap and FourAryHeap, but they are fairly small. Should I put these new classes in k_ary_heap.h and .cpp instead?

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I suggest some more comments in various places, as listed in the detailed feedback.

@vaughnbetz
Copy link
Contributor

You'll have to check out the CI results. I look at vtr_nightly_test1_odin, and it is OK. A few small circuits have QoR changes that are beyond the tolerances, but for these small circuits they are not concerning. One has too good a minimum channel width, while the others have ones that are a bit worse but not concerning for small designs. You update the golden results and check them in (see the developer guide) to make those pass. Below are the tests from vtr_nightly_test1_odin as an example.

Amin was also going to widen the error tolerance for some of these tests; I'll check on that PR as rebasing after it is merged may save you some work.

You should also attach your QoR spreadsheets so we can see the results (which you have summarized) on big circuits. We'd want to see a speedup and no overall critical path increase or wirelength increase (or sub-1% so it is within noise).

I'm fine with the current code and the small binary heap file -- more smaller files is fine.

2024-06-21T20:47:50.7138774Z �[32;1m20:47:50�[0m | Calculating QoR results...
2024-06-21T20:47:50.7139378Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7140262Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/vpr_reg_mcnc_equiv...[Fail]
2024-06-21T20:47:50.7141338Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7142816Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width relative value 0.17582417582417584 outside of range [0.25,1.3] and not equal to golden value: 182.0
2024-06-21T20:47:50.7144133Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7145671Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width_routing_area_total relative value 0.2212341391243292 outside of range [0.7,1.3] and not equal to golden value: 1595120.0
2024-06-21T20:47:50.7147087Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7148618Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width_routing_area_per_tile relative value 0.22123501890118832 outside of range [0.7,1.3] and not equal to golden value: 9438.56
2024-06-21T20:47:50.7150057Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7151603Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common crit_path_routing_area_total relative value 0.2147835400787127 outside of range [0.7,1.3] and not equal to golden value: 2007300.0
2024-06-21T20:47:50.7152993Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7154521Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common crit_path_routing_area_per_tile relative value 0.21478341401810147 outside of range [0.7,1.3] and not equal to golden value: 11877.5
2024-06-21T20:47:50.7156147Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/vpr_reg_mcnc...[Pass]
2024-06-21T20:47:50.7157289Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/vtr_reg_fpu_soft_logic_arch...[Pass]
2024-06-21T20:47:50.7158497Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/vtr_reg_fpu_hard_block_arch...[Pass]
2024-06-21T20:47:50.7159596Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/figure_8...[Pass]
2024-06-21T20:47:50.7160606Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/FIR_filters...[Pass]
2024-06-21T20:47:50.7161696Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/FIR_filters_frac...[Pass]
2024-06-21T20:47:50.7162512Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7163420Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail]
2024-06-21T20:47:50.7164302Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7165605Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width relative value 1.4666666666666666 outside of range [0.25,1.3] and not equal to golden value: 30.0
2024-06-21T20:47:50.7166939Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7168337Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width_routing_area_total relative value 1.4112435202627822 outside of range [0.7,1.3] and not equal to golden value: 526063.0
2024-06-21T20:47:50.7169764Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7171207Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width_routing_area_per_tile relative value 1.4112421647100186 outside of range [0.7,1.3] and not equal to golden value: 1820.29
2024-06-21T20:47:50.7172665Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7174032Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common crit_path_routing_area_total relative value 1.4061912035217121 outside of range [0.7,1.3] and not equal to golden value: 666494.0
2024-06-21T20:47:50.7175545Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7176930Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common crit_path_routing_area_per_tile relative value 1.406190242866001 outside of range [0.7,1.3] and not equal to golden value: 2306.21
2024-06-21T20:47:50.7178342Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7179070Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail]
2024-06-21T20:47:50.7179892Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7181245Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common min_chan_width relative value 1.3333333333333333 outside of range [0.25,1.3] and not equal to golden value: 30.0
2024-06-21T20:47:50.7182652Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7184024Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common crit_path_routing_area_total relative value 1.3088865016039155 outside of range [0.7,1.3] and not equal to golden value: 666494.0
2024-06-21T20:47:50.7185435Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7186818Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common crit_path_routing_area_per_tile relative value 1.3088834061078565 outside of range [0.7,1.3] and not equal to golden value: 2306.21
2024-06-21T20:47:50.7188231Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7188965Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail]
2024-06-21T20:47:50.7189788Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7191204Z �[32;1m20:47:50�[0m | fixed_k6_frac_2ripple_N8_22nm.xml/mult_007.v/common min_chan_width_routing_area_total relative value 1.4869164661785093 outside of range [0.7,1.3] and not equal to golden value: 706193.0
2024-06-21T20:47:50.7192700Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7194097Z �[32;1m20:47:50�[0m | fixed_k6_frac_2ripple_N8_22nm.xml/mult_007.v/common min_chan_width_routing_area_per_tile relative value 1.4869085522061893 outside of range [0.7,1.3] and not equal to golden value: 2443.58
2024-06-21T20:47:50.7195532Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7196265Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail]
2024-06-21T20:47:50.7197088Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7198333Z �[32;1m20:47:50�[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width relative value 1.3333333333333333 outside of range [0.25,1.3] and not equal to golden value: 36.0
2024-06-21T20:47:50.7199606Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7200944Z �[32;1m20:47:50�[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width_routing_area_total relative value 1.3335469993281848 outside of range [0.7,1.3] and not equal to golden value: 648988.0
2024-06-21T20:47:50.7202360Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7203829Z �[32;1m20:47:50�[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width_routing_area_per_tile relative value 1.3335500505426092 outside of range [0.7,1.3] and not equal to golden value: 2245.63
2024-06-21T20:47:50.7205459Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/open_cores...[Pass]
2024-06-21T20:47:50.7206495Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/open_cores_frac...[Pass]
2024-06-21T20:47:50.7207621Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/power_extended_arch_list...[Pass]
2024-06-21T20:47:50.7208488Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7209294Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/power_extended_circuit_list...[Fail]
2024-06-21T20:47:50.7210181Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7211558Z �[32;1m20:47:50�[0m | k6_N10_I40_Fi6_L4_frac1_ff1_45nm.xml/mkDelayWorker32B.v/common total_power relative value 0.8973242065961418 outside of range [0.9,1.1] and not equal to golden value: 0.1607
2024-06-21T20:47:50.7212942Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7213746Z �[32;1m20:47:50�[0m | Test 'vtr_reg_nightly_test1_odin' had 19 qor test failures
2024-06-21T20:47:50.7214572Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7215303Z �[32;1m20:47:50�[0m | Test 'vtr_reg_nightly_test1_odin' had 0 run failures

@nedsels
Copy link
Contributor Author

nedsels commented Jun 24, 2024

Looks good. I suggest some more comments in various places, as listed in the detailed feedback.

These issues should now be fixed

@nedsels nedsels force-pushed the improve_binary_heap branch from 9de4f45 to d580ab1 Compare June 24, 2024 02:17
@nedsels nedsels force-pushed the improve_binary_heap branch from 8ef8272 to d56143b Compare June 24, 2024 14:57
Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the code and QoR look good.

Forgot to answer this before, but it would be good to add a regtest for the binary heap to make sure it still works. It can be a single circuit and small.

We should add the nice Doxygen documentation to the VPR Internals as a new tab (Router heap). Can you also make the Doxygen config changes to make that happen?


size_t num_children = std::min((size_t)4, heap_tail_ - child_1);

switch (num_children) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is faster than a simple loop or slower than a loop? We can merge this code, but I suggest running an experiment with this replaced by a loop and see if it is as fast or faster; it would be easier to read if there is no speed gain from this if nest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a loop is 3% slower; is this small enough to be worth changing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's leave as is then. If you can add a comment giving this result that would be good though.

@github-actions github-actions bot added the docs Documentation label Jun 25, 2024
@nedsels
Copy link
Contributor Author

nedsels commented Jun 25, 2024

I added the heap section to the API documentation. However, I think this instruction may be outdated or incorrect; I could not run it successfully (doc/conf.py imports markdown_code_symlinks; pip install markdown_code_symlinks gives an error "ERROR: No matching distribution found for markdown_code_symlinks").

@nedsels nedsels force-pushed the improve_binary_heap branch from 7c98cc9 to d906ea8 Compare June 25, 2024 04:53
@nedsels
Copy link
Contributor Author

nedsels commented Jun 26, 2024

@vaughnbetz All tests now passing

@vaughnbetz vaughnbetz merged commit dce37b9 into verilog-to-routing:master Jun 28, 2024
53 checks passed
@nedsels nedsels deleted the improve_binary_heap branch June 28, 2024 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants