Update BinaryHeap to FourAryHeap #2627

nedsels · 2024-06-21T17:07:30Z

A new FourAryHeap has been added, and this has been made the default VPR heap implementation.

Description

A new abstract class KAryHeap has been created. This class is like the previous BinaryHeap, except it doesn't implement is_valid(), get_heap_head(), or sift_down(). Also, a new struct heap_elem has been added, which contains a t_heap pointer and a float cost for a certain node. This is now what the heap_ vector contains, instead of just t_heap pointers. In KAryHeap's methods, instead of dereferencing a t_heap pointer to get its cost for node comparisons, it is stored in heap_elem.

BinaryHeap and FourAryHeap now inherit from KAryHeap, and implement its pure virtual methods (all other methods are not specific to either of these two heap types). FOUR_ARY_HEAP has been added to the e_heap_type enum, and has been made the default heap implementation.

Motivation and Context

Previously, BinaryHeap's heap_ vector contains t_heap pointers for each node in the heap. To get a node's cost, the respective pointer would be dereferenced, which increase runtime since this would occur whenever there was a comparison between two nodes. Now, instead of heap_ being made of these pointers, it is of a struct heap_elem which contains both the pointer and the node's cost. This cost is attained by dereferencing the pointer once in add_to_heap(), eliminating the need for dereferencing in future comparisons.
This heap_elem is 12 bytes (pointer and float), which means 5 can fit on a 64-byte cache line; so, using a 4-ary heap instead of a binary heap is more cache-friendly, and improves runtime even with more comparisons (between 4 nodes vs 2).

For the future: if, instead of using t_heap pointers, a 4-byte ID can be used, heap_elem can be reduced to 8 bytes. Then, an 8-ary heap might be even better, even with many more comparisons. An 8-ary heap is fastest when using t_heap pointers in heap_ (8 can fit on a cache line) and this heap implementation as VPR's heap.

How Has This Been Tested?

Three changes were considered:

Using heap_elem instead of t_heap pointers in the heap_ vector to eliminate unnecessary dereferencing
Using a 4-ary heap instead of a binary heap
Using memory alignment (8 bytes for when change (1.) is off, 16 bytes for when (1.) is on)

The 8 combinations of these changes were tested on the koios_medium benchmarks, and some combinations on titan benchmarks, on desktop. For smaller circuits, the results are inconsistent, but on titan, the both changes (1.) and (2.) result in runtime improvements. Change (3.) seems to have a negligible effect on runtime. Using changes (1.) and (2.) together results in a 4.2% reduction in runtime.

On titan benchmarks (all values are geomeans, averaged across 8 runs):

	Total Routing Runtime (s)	Runtime Improvement	CPD (ns)	Wirelength	Memory Footprint (`max_rss`)
Old VPR Heap	186.8157	0.00%	17.1627	3200614	5501.6439
New VPR Heap	178.8911	4.24%	17.1809	3201187	5503.7495

Full Results

Types of changes

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

…ich contains node cost to prevent unnecessary dereferecing, and turn on/off alignment for that struct

…to-routing into improve_binary_heap

…herited class of KAryHeap

…to-routing into improve_binary_heap

…rilog-to-routing

…to-routing into improve_binary_heap

…heap implementation

This reverts commit c58ca70.

nedsels · 2024-06-21T19:54:07Z

@vaughnbetz I'm wondering about two things:

Is it worth adding regression tests that test both BinaryHeap and FourAryHeap?
There are separate .h and .cpp files for BinaryHeap and FourAryHeap, but they are fairly small. Should I put these new classes in k_ary_heap.h and .cpp instead?

vaughnbetz

Looks good. I suggest some more comments in various places, as listed in the detailed feedback.

vpr/src/base/ShowSetup.cpp

vpr/src/route/four_ary_heap.h

vpr/src/route/k_ary_heap.h

vaughnbetz · 2024-06-23T19:49:21Z

You'll have to check out the CI results. I look at vtr_nightly_test1_odin, and it is OK. A few small circuits have QoR changes that are beyond the tolerances, but for these small circuits they are not concerning. One has too good a minimum channel width, while the others have ones that are a bit worse but not concerning for small designs. You update the golden results and check them in (see the developer guide) to make those pass. Below are the tests from vtr_nightly_test1_odin as an example.

Amin was also going to widen the error tolerance for some of these tests; I'll check on that PR as rebasing after it is merged may save you some work.

You should also attach your QoR spreadsheets so we can see the results (which you have summarized) on big circuits. We'd want to see a speedup and no overall critical path increase or wirelength increase (or sub-1% so it is within noise).

I'm fine with the current code and the small binary heap file -- more smaller files is fine.

2024-06-21T20:47:50.7138774Z �[32;1m20:47:50�[0m | Calculating QoR results...
2024-06-21T20:47:50.7139378Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7140262Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/vpr_reg_mcnc_equiv...[Fail]
2024-06-21T20:47:50.7141338Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7142816Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width relative value 0.17582417582417584 outside of range [0.25,1.3] and not equal to golden value: 182.0
2024-06-21T20:47:50.7144133Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7145671Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width_routing_area_total relative value 0.2212341391243292 outside of range [0.7,1.3] and not equal to golden value: 1595120.0
2024-06-21T20:47:50.7147087Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7148618Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common min_chan_width_routing_area_per_tile relative value 0.22123501890118832 outside of range [0.7,1.3] and not equal to golden value: 9438.56
2024-06-21T20:47:50.7150057Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7151603Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common crit_path_routing_area_total relative value 0.2147835400787127 outside of range [0.7,1.3] and not equal to golden value: 2007300.0
2024-06-21T20:47:50.7152993Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7154521Z �[32;1m20:47:50�[0m | k6_N10_40nm.xml/tseng.pre-vpr.blif/common crit_path_routing_area_per_tile relative value 0.21478341401810147 outside of range [0.7,1.3] and not equal to golden value: 11877.5
2024-06-21T20:47:50.7156147Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/vpr_reg_mcnc...[Pass]
2024-06-21T20:47:50.7157289Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/vtr_reg_fpu_soft_logic_arch...[Pass]
2024-06-21T20:47:50.7158497Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/vtr_reg_fpu_hard_block_arch...[Pass]
2024-06-21T20:47:50.7159596Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/figure_8...[Pass]
2024-06-21T20:47:50.7160606Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/FIR_filters...[Pass]
2024-06-21T20:47:50.7161696Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/FIR_filters_frac...[Pass]
2024-06-21T20:47:50.7162512Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7163420Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail]
2024-06-21T20:47:50.7164302Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7165605Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width relative value 1.4666666666666666 outside of range [0.25,1.3] and not equal to golden value: 30.0
2024-06-21T20:47:50.7166939Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7168337Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width_routing_area_total relative value 1.4112435202627822 outside of range [0.7,1.3] and not equal to golden value: 526063.0
2024-06-21T20:47:50.7169764Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7171207Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common min_chan_width_routing_area_per_tile relative value 1.4112421647100186 outside of range [0.7,1.3] and not equal to golden value: 1820.29
2024-06-21T20:47:50.7172665Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7174032Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common crit_path_routing_area_total relative value 1.4061912035217121 outside of range [0.7,1.3] and not equal to golden value: 666494.0
2024-06-21T20:47:50.7175545Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7176930Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_022.v/common crit_path_routing_area_per_tile relative value 1.406190242866001 outside of range [0.7,1.3] and not equal to golden value: 2306.21
2024-06-21T20:47:50.7178342Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7179070Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail]
2024-06-21T20:47:50.7179892Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7181245Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common min_chan_width relative value 1.3333333333333333 outside of range [0.25,1.3] and not equal to golden value: 30.0
2024-06-21T20:47:50.7182652Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7184024Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common crit_path_routing_area_total relative value 1.3088865016039155 outside of range [0.7,1.3] and not equal to golden value: 666494.0
2024-06-21T20:47:50.7185435Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7186818Z �[32;1m20:47:50�[0m | fixed_k6_N8_gate_boost_0.2V_22nm.xml/mult_045.v/common crit_path_routing_area_per_tile relative value 1.3088834061078565 outside of range [0.7,1.3] and not equal to golden value: 2306.21
2024-06-21T20:47:50.7188231Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7188965Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail]
2024-06-21T20:47:50.7189788Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7191204Z �[32;1m20:47:50�[0m | fixed_k6_frac_2ripple_N8_22nm.xml/mult_007.v/common min_chan_width_routing_area_total relative value 1.4869164661785093 outside of range [0.7,1.3] and not equal to golden value: 706193.0
2024-06-21T20:47:50.7192700Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7194097Z �[32;1m20:47:50�[0m | fixed_k6_frac_2ripple_N8_22nm.xml/mult_007.v/common min_chan_width_routing_area_per_tile relative value 1.4869085522061893 outside of range [0.7,1.3] and not equal to golden value: 2443.58
2024-06-21T20:47:50.7195532Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7196265Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/multless_consts...[Fail]
2024-06-21T20:47:50.7197088Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7198333Z �[32;1m20:47:50�[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width relative value 1.3333333333333333 outside of range [0.25,1.3] and not equal to golden value: 36.0
2024-06-21T20:47:50.7199606Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7200944Z �[32;1m20:47:50�[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width_routing_area_total relative value 1.3335469993281848 outside of range [0.7,1.3] and not equal to golden value: 648988.0
2024-06-21T20:47:50.7202360Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7203829Z �[32;1m20:47:50�[0m | fixed_k6_frac_N8_22nm.xml/mult_012.v/common min_chan_width_routing_area_per_tile relative value 1.3335500505426092 outside of range [0.7,1.3] and not equal to golden value: 2245.63
2024-06-21T20:47:50.7205459Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/open_cores...[Pass]
2024-06-21T20:47:50.7206495Z �[32;1m20:47:50�[0m | vtr_reg_nightly_test1_odin/arithmetic_tasks/open_cores_frac...[Pass]
2024-06-21T20:47:50.7207621Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/power_extended_arch_list...[Pass]
2024-06-21T20:47:50.7208488Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7209294Z �[32;1m20:47:50�[0m | regression_tests/vtr_reg_nightly_test1_odin/power_extended_circuit_list...[Fail]
2024-06-21T20:47:50.7210181Z �[32;1m20:47:50�[0m | [Fail]
2024-06-21T20:47:50.7211558Z �[32;1m20:47:50�[0m | k6_N10_I40_Fi6_L4_frac1_ff1_45nm.xml/mkDelayWorker32B.v/common total_power relative value 0.8973242065961418 outside of range [0.9,1.1] and not equal to golden value: 0.1607
2024-06-21T20:47:50.7212942Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7213746Z �[32;1m20:47:50�[0m | Test 'vtr_reg_nightly_test1_odin' had 19 qor test failures
2024-06-21T20:47:50.7214572Z �[32;1m20:47:50�[0m |
2024-06-21T20:47:50.7215303Z �[32;1m20:47:50�[0m | Test 'vtr_reg_nightly_test1_odin' had 0 run failures

…ing bucket heap)

nedsels · 2024-06-24T01:07:00Z

Looks good. I suggest some more comments in various places, as listed in the detailed feedback.

These issues should now be fixed

vaughnbetz

Thanks, the code and QoR look good.

Forgot to answer this before, but it would be good to add a regtest for the binary heap to make sure it still works. It can be a single circuit and small.

We should add the nice Doxygen documentation to the VPR Internals as a new tab (Router heap). Can you also make the Doxygen config changes to make that happen?

vaughnbetz · 2024-06-24T17:13:06Z

vpr/src/route/four_ary_heap.cpp

+
+    size_t num_children = std::min((size_t)4, heap_tail_ - child_1);
+
+    switch (num_children) {


I wonder if this is faster than a simple loop or slower than a loop? We can merge this code, but I suggest running an experiment with this replaced by a loop and see if it is as fast or faster; it would be easier to read if there is no speed gain from this if nest.

Using a loop is 3% slower; is this small enough to be worth changing?

OK, let's leave as is then. If you can add a comment giving this result that would be good though.

nedsels · 2024-06-25T04:43:05Z

I added the heap section to the API documentation. However, I think this instruction may be outdated or incorrect; I could not run it successfully (doc/conf.py imports markdown_code_symlinks; pip install markdown_code_symlinks gives an error "ERROR: No matching distribution found for markdown_code_symlinks").

nedsels · 2024-06-26T23:22:58Z

@vaughnbetz All tests now passing

Nathan Shreve and others added 16 commits May 23, 2024 18:09

Added option to select D-arity, turn on/off use of a new heap_elem wh…

cfd9afa

…ich contains node cost to prevent unnecessary dereferecing, and turn on/off alignment for that struct

Merge branch 'verilog-to-routing:master' into improve_binary_heap

0db85d9

Wrote scripts to help running tests and profile results

6038ff6

Wrote scripts to help running tests and profile results

3fde949

Merge branch 'improve_binary_heap' of github.com:nedsels/vtr-verilog-…

51f78a8

…to-routing into improve_binary_heap

Updated black version to 21.4b0 from 20.8b1 so that make env works

817c833

Created final FourAryHeap implementation, but kept BinaryHeap as a in…

a21d763

…herited class of KAryHeap

Created final FourAryHeap implementation, but kept BinaryHeap as a in…

ebd8148

…herited class of KAryHeap

Merge branch 'improve_binary_heap' of github.com:nedsels/vtr-verilog-…

a7f33f6

…to-routing into improve_binary_heap

Merge branch 'master' of https://github.com/verilog-to-routing/vtr-ve…

b68ce5f

…rilog-to-routing

Merge branch 'master' into improve_binary_heap

89a5c88

Removed .gitignore changes

5176261

Removed .gitignore changes

da0018a

Merge branch 'improve_binary_heap' of github.com:nedsels/vtr-verilog-…

7d869d2

…to-routing into improve_binary_heap

Fixed bug in FourAryHeap::get_heap_head and make FourAryHeap default …

9a5d3b1

…heap implementation

parent() is now static function

c58ca70

github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Jun 21, 2024

Revert "parent() is now static function"

339fe8d

This reverts commit c58ca70.

nedsels marked this pull request as ready for review June 21, 2024 19:41

nedsels requested a review from vaughnbetz June 21, 2024 19:43

vaughnbetz requested changes Jun 23, 2024

View reviewed changes

vpr/src/base/ShowSetup.cpp Outdated Show resolved Hide resolved

vpr/src/route/four_ary_heap.h Show resolved Hide resolved

vpr/src/route/k_ary_heap.h Show resolved Hide resolved

vpr/src/route/k_ary_heap.h Show resolved Hide resolved

vpr/src/route/k_ary_heap.h Show resolved Hide resolved

Nathan Shreve added 2 commits June 23, 2024 20:05

Eliminated duplicate code in ShowRouterOpts()

bde3cae

Added Doxygen comments to HeapImplementation and its children (exclud…

679618a

…ing bucket heap)

Updated golden values

d580ab1

nedsels force-pushed the improve_binary_heap branch from 9de4f45 to d580ab1 Compare June 24, 2024 02:17

Modified Doxygen comments on heaps

d56143b

nedsels force-pushed the improve_binary_heap branch from 8ef8272 to d56143b Compare June 24, 2024 14:57

Merge branch 'master' into improve_binary_heap

15fb108

vaughnbetz approved these changes Jun 24, 2024

View reviewed changes

Nathan Shreve added 4 commits June 24, 2024 19:04

Updated more golden results

d51f91a

Fixed bug in FourAryHeap::smallest_child()

a2e2970

Added regression test for BinaryHeap

a054a14

Added router heap to VPRINTERNALS API

51cd8d4

github-actions bot added the docs Documentation label Jun 25, 2024

Merge branch 'master' into improve_binary_heap

d906ea8

nedsels force-pushed the improve_binary_heap branch from 7c98cc9 to d906ea8 Compare June 25, 2024 04:53

Added more golden results

fc49189

vaughnbetz merged commit dce37b9 into verilog-to-routing:master Jun 28, 2024
53 checks passed

nedsels deleted the improve_binary_heap branch June 28, 2024 03:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update BinaryHeap to FourAryHeap #2627

Update BinaryHeap to FourAryHeap #2627

Uh oh!

nedsels commented Jun 21, 2024 •

edited

Loading

Uh oh!

nedsels commented Jun 21, 2024 •

edited

Loading

Uh oh!

vaughnbetz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vaughnbetz commented Jun 23, 2024

Uh oh!

nedsels commented Jun 24, 2024

Uh oh!

vaughnbetz left a comment •

edited

Loading

Uh oh!

vaughnbetz Jun 24, 2024

Uh oh!

nedsels Jun 24, 2024

Uh oh!

vaughnbetz Jun 24, 2024

Uh oh!

nedsels commented Jun 25, 2024

Uh oh!

nedsels commented Jun 26, 2024

Uh oh!

Uh oh!

Uh oh!


		size_t num_children = std::min((size_t)4, heap_tail_ - child_1);

		switch (num_children) {

Update BinaryHeap to FourAryHeap #2627

Update BinaryHeap to FourAryHeap #2627

Uh oh!

Conversation

nedsels commented Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

nedsels commented Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaughnbetz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vaughnbetz commented Jun 23, 2024

Uh oh!

nedsels commented Jun 24, 2024

Uh oh!

vaughnbetz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vaughnbetz Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

nedsels Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

vaughnbetz Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

nedsels commented Jun 25, 2024

Uh oh!

nedsels commented Jun 26, 2024

Uh oh!

Uh oh!

Uh oh!

nedsels commented Jun 21, 2024 •

edited

Loading

nedsels commented Jun 21, 2024 •

edited

Loading

vaughnbetz left a comment •

edited

Loading