[ClusterLegalizer] Cluster Legalizer API #2709

AlexandreSinger · 2024-09-01T19:05:05Z

The goal of this PR is to clean up the cluster legalization code in the packer (the code which creates the clusters and ensures that they are legal) into a clean API which can be called in an Analytical Placement flow which will be called externally to the Packer. In fact, the packer will never be called; so truly it would "replace" the packer.

Isolated the cluster legalization logic out of the packer's clustering algorithm in such a way that it can be used externally to the clusterer with no issue. Once the ClusterLegalizer API is constructed, it will allocate all of the memory that it needs for legalization; it can then be used to legalize clusters; and when it is destroyed it will remove all state created for clustering.

The cluster legalizer API maintains the legalized clusters internally as opposed to using the ClusterBlockId as an ID to the clusters. This allowed me to separate the ClusteredNetlist from the packing algorithm entirely. This importantly helped separate out all legalization logic from directly modifying the ClusteredNetlist generation code.

I did my best to remove as much global state as I could from the API to make it as self-contained as possible.

This change greatly cleaned up the packer code and removed many of its global variables (including the ClusterLegalizer itself, which is now a local variable!)

AlexandreSinger · 2024-09-01T23:01:23Z

This PR is still WIP since I have some small cleanups and documentation left; however, I wanted to get an idea on the test coverage. We are passing all of the regular tests and the NightlyTests are currently down, so I am just trying the VTR benchmarks for now. Below is the current results on the VTR benchmarks:

	vtr_parse_results_prepack.txt	vtr_parse_results_legalizer.txt
vtr_flow_elapsed_time	1	0.991681512
odin_synth_time
parmys_synth_time	1	1.094871271
abc_depth	1	1
abc_synth_time	1	0.969154666
num_clb	1	1
num_memories	1	1
num_mult	1	1
max_vpr_mem	1	1.220798835
num_pre_packed_blocks	1	1
num_post_packed_blocks	1	1
device_grid_tiles	1	1
pack_time	1	1.00876255
placed_wirelength_est	1	1
place_time	1	0.976409566
placed_CPD_est	1	1
min_chan_width	1	1
routed_wirelength	1	1
min_chan_width_route_time	1	0.971295383
crit_path_routed_wirelength	1	1
critical_path_delay	1	1
geomean_nonvirtual_intradomain_critical_path_delay	1	1
crit_path_route_time	1	0.995582056

The routed wirelength, min channel width, and the CPD are all exactly the same (which means this API upgrade is functionally equivalent) which is very good news. The pack time did not change much, but the max memory appears to have increased by 22%.

@vaughnbetz This increase in max memory is what we discussed in our last meeting. This is likely caused by the old clusterer algorithm freeing all the intermediate clustering data after each cluster is created. The current design keeps all the intermediate clustering data (such as the intra-lb routing data) so that clusters can be modified later (even though it isnt). I think if we use your idea with a "clean cluster data" method, we may be able to bring down this max memory. Do you think that is necessary? It would be a "weird" interface since it implies that the cluster cannot be modified by the legalizer later if it was cleaned.

vaughnbetz · 2024-09-02T16:33:14Z

It would be good to get the memory increase on the largest Titan designs. A 20% or so memory increase isn't catastrophic, but it is noticable so it would be good to put the clustering data structures on a diet if this increase affects the biggest designs.

AlexandreSinger · 2024-09-02T23:42:58Z

@vaughnbetz I implemented what we discussed (a clean method). While looking into the data structures for the router data and (especially) the pb stats, I noticed that they take up a large amount of space. It makes sense to clean them when the user knows they will no longer be modifying a cluster. This problem is mainly caused by the clusterer storing data within the pb type itself, hopefully someday we can separate this out!

With this change, the max memory for the VTR benchmarks is now slightly better than it was before!

	vtr_parse_results_prepack.txt	vtr_parse_results_legalizer.txt
vtr_flow_elapsed_time	1	1.025408546
odin_synth_time
parmys_synth_time	1	1.101305826
abc_depth	1	1
abc_synth_time	1	0.971543723
num_clb	1	1
num_memories	1	1
num_mult	1	1
max_vpr_mem	1	0.992619072
num_pre_packed_blocks	1	1
num_post_packed_blocks	1	1
device_grid_tiles	1	1
pack_time	1	1.008189595
placed_wirelength_est	1	1
place_time	1	1.046060717
placed_CPD_est	1	1
min_chan_width	1	1
routed_wirelength	1	1
min_chan_width_route_time	1	1.025420775
crit_path_routed_wirelength	1	1
critical_path_delay	1	1
geomean_nonvirtual_intradomain_critical_path_delay	1	1
crit_path_route_time	1	1.035469963

Here is the raw data in case you were curious.
legalizer_comparison.xlsx

For the VTR circuits, there appears to be no change in quality, runtime, or memory usage and now the API is significantly easier to work with.

I still have some small changes to make before I remove the WIP status, but I expect it to be done by tomorrow. We should wait for the NightlyTests though to merge this in. I am still on the ropes about what to do about the reclustering API; I will continue to think on it.

vaughnbetz · 2024-09-03T15:09:38Z

Great, thanks @AlexandreSinger

AlexandreSinger · 2024-09-03T19:47:54Z

@vaughnbetz I have finalized the PR and removed the WIP label. The API is now ready for review. I have decided to completely remove the re_clustering API since, while trying to upgrade it to use the new legalizer API, I found that most of the code would need to be either removed or completely re-thought. Since this API contains no CI tests, it would be challenging to upgrade it; especially since it would require me to test it in both a placement context (which the placer is currently not setup to handle performing cluster legalization) and a packing context (which is now trivial since the re_clustering API would literally just be a wrapper to this new legalizer). The code will still exist in the Git history if someone wants to bring the API back; but I do believe that a complete rewrite is in order to bring this feature fully in (but it would be much easier to rewrite than prior). Please let me know what you think about this.

I will run the Titan circuits this afternoon and see how this API performs. I expect the runtime, memory usage, and quality to be very similar.

AlexandreSinger · 2024-09-04T03:49:25Z

Here are the results on the Titan circuits:

	titan_parse_results_master.txt	tan_parse_results_legalizer.txt
vtr_flow_elapsed_time	1	0.981573162
num_LAB	1	1
num_DSP	1	1
num_M9K	1	1
num_M144K	1	1
max_vpr_mem	1	0.993315751
num_pre_packed_blocks	1	1
num_post_packed_blocks	1	1
device_grid_tiles	1	1
pack_time	1	1.001249895
placed_wirelength_est	1	1
place_time	1	0.963296599
placed_CPD_est	1	1
routed_wirelength	1	1
critical_path_delay	1	1
geomean_nonvirtual_intradomain_critical_path_delay	1	1
crit_path_route_time	1	0.992672691

QoR is identical, pack time is practically the same, max_vpr_mem is down slightly.

I also pulled out the pack memory to directly see how the memory of the packer was affected by these code improvements:

	Geomean pack_mem
Master	3798.451049 MiB
New	3402.47893 MiB
% decrease	10.43% decrease

For some of the largest circuits, this amounts to around a gigabyte of memory savings. I think even more can be saved if the data structures were organized a bit better.

Here is the raw data if you are curious:
titan_legalizer_comparison.xlsx

vaughnbetz · 2024-09-04T06:50:26Z

Great, thanks! Will review soon (maybe tonight after the conference).

soheilshahrouz

I added a few comments to suggest minor improvements.

@AlexandreSinger, cluster_util.cpp was too large and with no clean structure. Your changes have significantly improved the readablity.

vpr/src/base/vpr_context.h

vpr/src/base/vpr_constraints_writer.cpp

vpr/src/base/vpr_context.h

vpr/src/pack/pack.cpp

vpr/src/pack/cluster_legalizer.h

vpr/src/pack/cluster_legalizer.cpp

vaughnbetz

Looks great, thanks! I have a bunch of suggestions, but they're all around commenting etc. not the actual code implementation.

vpr/src/base/vpr_api.cpp

vpr/src/base/vpr_context.h

vpr/src/pack/cluster.cpp

vpr/src/pack/cluster_legalizer.h

vpr/src/pack/cluster_legalizer.cpp

vpr/src/pack/cluster_util.h

vpr/src/pack/cluster_util.cpp

Isolated the cluster legalization logic out of the packer's clustering algorithm in such a way that it can be used externally to the clusterer with no issue. Once the ClusterLegalizer API is constructed, it will allocate all of the memory that it needs for legalization; it can then be used to legalize clusters; and when it is destroyed it will remove all state created for clustering. The cluster legalizer API maintains the legalized clusters internally as opposed to using the ClusterBlockId as an ID to the clusters. This allowed me to separate the ClusteredNetlist from the packing algorithm entirely. This importantly helper separate out all legalization logic from directly modifying the ClusteredNetlist generation code. I did my best to remove as much global state as I could from the API to make it as self-contained as possible. This change greatly cleaned up the packer code and removed many of its global variables (including the ClusterLegalizer itself, which is now a local variable!) This change also removed the re_clustering API since it would need to be heavily modified to include this change; and since this feature was not being tested, it would be challenging to upgrade to use this new cluster legalization API.

Found that one of the pb_stats members is required for cluster legalization; but was being calculated outside of the cluster legalizer. Moved this in to allow the cluster legalizer to be used outside of the clusterer. Also found an issue where two clusters of the same type cannot be constructed at the same time. Tried fixing it, but it produces different results (different clusters). Will raise in a separate PR, left as a fixme comment.

Added more documentation. Cleaned up one set which should have been a vector.

AlexandreSinger · 2024-09-16T04:06:19Z

@vaughnbetz Thank you so much for all of your comments. I really appreciate them. I have resolved most of them. I want to discuss with you forward-declaring the static methods; I personally do not like doing that, but its not a hill I will die on.

I am running nightly_test 1, 2, 3, and 4 tonight and will run 5, 6, and 7 tomorrow (they should be a bit quicker). I will let you know the results of these runs. Since this should be functionally equivalent, I do not expect golden results to change.

After this PR is merged, I plan to raise another PR fixing a limitation of this API. I will also raise a few issues based on your comments above which I did not resolve in this PR.

vaughnbetz · 2024-09-16T13:36:37Z

Thanks Alex. I'm not fanatical about declaring static routines at the top of the file (although I do prefer it) so happy to discuss.

github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Sep 1, 2024

AlexandreSinger force-pushed the feature-cluster-legalizer-api branch from ea77b2d to 76564d2 Compare September 1, 2024 20:12

AlexandreSinger force-pushed the feature-cluster-legalizer-api branch from d4c0df2 to 97bf1ba Compare September 3, 2024 19:42

AlexandreSinger changed the title ~~[WIP] [ClusterLegalizer] Cluster Legalizer API~~ [ClusterLegalizer] Cluster Legalizer API Sep 3, 2024

AlexandreSinger requested review from vaughnbetz and soheilshahrouz September 3, 2024 19:43

soheilshahrouz reviewed Sep 4, 2024

View reviewed changes

AlexandreSinger force-pushed the feature-cluster-legalizer-api branch 3 times, most recently from 552ac6d to f33dde9 Compare September 10, 2024 14:06

github-actions bot added the libarchfpga Library for handling FPGA Architecture descriptions label Sep 11, 2024

AlexandreSinger force-pushed the feature-cluster-legalizer-api branch from 06c5ae0 to f33dde9 Compare September 11, 2024 14:13

vaughnbetz approved these changes Sep 14, 2024

View reviewed changes

AlexandreSinger added 3 commits September 15, 2024 23:51

[ClusterLegalizer] Updated Based on PR Comments

ba88f64

Added more documentation. Cleaned up one set which should have been a vector.

AlexandreSinger force-pushed the feature-cluster-legalizer-api branch from f3528a5 to ba88f64 Compare September 16, 2024 03:51

vaughnbetz merged commit 09d498f into verilog-to-routing:master Sep 16, 2024
36 of 52 checks passed

AlexandreSinger deleted the feature-cluster-legalizer-api branch September 16, 2024 19:20

[ClusterLegalizer] Cluster Legalizer API #2709

[ClusterLegalizer] Cluster Legalizer API #2709

Uh oh!

Conversation

AlexandreSinger commented Sep 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexandreSinger commented Sep 1, 2024

Uh oh!

vaughnbetz commented Sep 2, 2024

Uh oh!

AlexandreSinger commented Sep 2, 2024

Uh oh!

vaughnbetz commented Sep 3, 2024

Uh oh!

AlexandreSinger commented Sep 3, 2024

Uh oh!

AlexandreSinger commented Sep 4, 2024

Uh oh!

vaughnbetz commented Sep 4, 2024

Uh oh!

soheilshahrouz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vaughnbetz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreSinger commented Sep 16, 2024

Uh oh!

vaughnbetz commented Sep 16, 2024

Uh oh!

Uh oh!

Uh oh!

AlexandreSinger commented Sep 1, 2024 •

edited

Loading

soheilshahrouz left a comment •

edited

Loading