Skip to content

[ClusterLegalizer] Cluster Legalizer API #2709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexandreSinger
Copy link
Contributor

@AlexandreSinger AlexandreSinger commented Sep 1, 2024

The goal of this PR is to clean up the cluster legalization code in the packer (the code which creates the clusters and ensures that they are legal) into a clean API which can be called in an Analytical Placement flow which will be called externally to the Packer. In fact, the packer will never be called; so truly it would "replace" the packer.

Isolated the cluster legalization logic out of the packer's clustering algorithm in such a way that it can be used externally to the clusterer with no issue. Once the ClusterLegalizer API is constructed, it will allocate all of the memory that it needs for legalization; it can then be used to legalize clusters; and when it is destroyed it will remove all state created for clustering.

The cluster legalizer API maintains the legalized clusters internally as opposed to using the ClusterBlockId as an ID to the clusters. This allowed me to separate the ClusteredNetlist from the packing algorithm entirely. This importantly helped separate out all legalization logic from directly modifying the ClusteredNetlist generation code.

I did my best to remove as much global state as I could from the API to make it as self-contained as possible.

This change greatly cleaned up the packer code and removed many of its global variables (including the ClusterLegalizer itself, which is now a local variable!)

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Sep 1, 2024
@AlexandreSinger AlexandreSinger force-pushed the feature-cluster-legalizer-api branch from ea77b2d to 76564d2 Compare September 1, 2024 20:12
@AlexandreSinger
Copy link
Contributor Author

This PR is still WIP since I have some small cleanups and documentation left; however, I wanted to get an idea on the test coverage. We are passing all of the regular tests and the NightlyTests are currently down, so I am just trying the VTR benchmarks for now. Below is the current results on the VTR benchmarks:

  vtr_parse_results_prepack.txt vtr_parse_results_legalizer.txt
vtr_flow_elapsed_time 1 0.991681512
odin_synth_time    
parmys_synth_time 1 1.094871271
abc_depth 1 1
abc_synth_time 1 0.969154666
num_clb 1 1
num_memories 1 1
num_mult 1 1
max_vpr_mem 1 1.220798835
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 1.00876255
placed_wirelength_est 1 1
place_time 1 0.976409566
placed_CPD_est 1 1
min_chan_width 1 1
routed_wirelength 1 1
min_chan_width_route_time 1 0.971295383
crit_path_routed_wirelength 1 1
critical_path_delay 1 1
geomean_nonvirtual_intradomain_critical_path_delay 1 1
crit_path_route_time 1 0.995582056

The routed wirelength, min channel width, and the CPD are all exactly the same (which means this API upgrade is functionally equivalent) which is very good news. The pack time did not change much, but the max memory appears to have increased by 22%.

@vaughnbetz This increase in max memory is what we discussed in our last meeting. This is likely caused by the old clusterer algorithm freeing all the intermediate clustering data after each cluster is created. The current design keeps all the intermediate clustering data (such as the intra-lb routing data) so that clusters can be modified later (even though it isnt). I think if we use your idea with a "clean cluster data" method, we may be able to bring down this max memory. Do you think that is necessary? It would be a "weird" interface since it implies that the cluster cannot be modified by the legalizer later if it was cleaned.

@vaughnbetz
Copy link
Contributor

It would be good to get the memory increase on the largest Titan designs. A 20% or so memory increase isn't catastrophic, but it is noticable so it would be good to put the clustering data structures on a diet if this increase affects the biggest designs.

@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz I implemented what we discussed (a clean method). While looking into the data structures for the router data and (especially) the pb stats, I noticed that they take up a large amount of space. It makes sense to clean them when the user knows they will no longer be modifying a cluster. This problem is mainly caused by the clusterer storing data within the pb type itself, hopefully someday we can separate this out!

With this change, the max memory for the VTR benchmarks is now slightly better than it was before!

  vtr_parse_results_prepack.txt vtr_parse_results_legalizer.txt
vtr_flow_elapsed_time 1 1.025408546
odin_synth_time    
parmys_synth_time 1 1.101305826
abc_depth 1 1
abc_synth_time 1 0.971543723
num_clb 1 1
num_memories 1 1
num_mult 1 1
max_vpr_mem 1 0.992619072
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 1.008189595
placed_wirelength_est 1 1
place_time 1 1.046060717
placed_CPD_est 1 1
min_chan_width 1 1
routed_wirelength 1 1
min_chan_width_route_time 1 1.025420775
crit_path_routed_wirelength 1 1
critical_path_delay 1 1
geomean_nonvirtual_intradomain_critical_path_delay 1 1
crit_path_route_time 1 1.035469963

Here is the raw data in case you were curious.
legalizer_comparison.xlsx

For the VTR circuits, there appears to be no change in quality, runtime, or memory usage and now the API is significantly easier to work with.

I still have some small changes to make before I remove the WIP status, but I expect it to be done by tomorrow. We should wait for the NightlyTests though to merge this in. I am still on the ropes about what to do about the reclustering API; I will continue to think on it.

@vaughnbetz
Copy link
Contributor

Great, thanks @AlexandreSinger

@AlexandreSinger AlexandreSinger force-pushed the feature-cluster-legalizer-api branch from d4c0df2 to 97bf1ba Compare September 3, 2024 19:42
@AlexandreSinger AlexandreSinger changed the title [WIP] [ClusterLegalizer] Cluster Legalizer API [ClusterLegalizer] Cluster Legalizer API Sep 3, 2024
@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz I have finalized the PR and removed the WIP label. The API is now ready for review. I have decided to completely remove the re_clustering API since, while trying to upgrade it to use the new legalizer API, I found that most of the code would need to be either removed or completely re-thought. Since this API contains no CI tests, it would be challenging to upgrade it; especially since it would require me to test it in both a placement context (which the placer is currently not setup to handle performing cluster legalization) and a packing context (which is now trivial since the re_clustering API would literally just be a wrapper to this new legalizer). The code will still exist in the Git history if someone wants to bring the API back; but I do believe that a complete rewrite is in order to bring this feature fully in (but it would be much easier to rewrite than prior). Please let me know what you think about this.

I will run the Titan circuits this afternoon and see how this API performs. I expect the runtime, memory usage, and quality to be very similar.

@AlexandreSinger
Copy link
Contributor Author

Here are the results on the Titan circuits:

  titan_parse_results_master.txt tan_parse_results_legalizer.txt
vtr_flow_elapsed_time 1 0.981573162
num_LAB 1 1
num_DSP 1 1
num_M9K 1 1
num_M144K 1 1
max_vpr_mem 1 0.993315751
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 1.001249895
placed_wirelength_est 1 1
place_time 1 0.963296599
placed_CPD_est 1 1
routed_wirelength 1 1
critical_path_delay 1 1
geomean_nonvirtual_intradomain_critical_path_delay 1 1
crit_path_route_time 1 0.992672691

QoR is identical, pack time is practically the same, max_vpr_mem is down slightly.

I also pulled out the pack memory to directly see how the memory of the packer was affected by these code improvements:

Geomean pack_mem
Master 3798.451049 MiB
New 3402.47893 MiB
% decrease 10.43% decrease

For some of the largest circuits, this amounts to around a gigabyte of memory savings. I think even more can be saved if the data structures were organized a bit better.

Here is the raw data if you are curious:
titan_legalizer_comparison.xlsx

@vaughnbetz
Copy link
Contributor

Great, thanks! Will review soon (maybe tonight after the conference).

Copy link
Contributor

@soheilshahrouz soheilshahrouz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments to suggest minor improvements.

@AlexandreSinger, cluster_util.cpp was too large and with no clean structure. Your changes have significantly improved the readablity.

@AlexandreSinger AlexandreSinger force-pushed the feature-cluster-legalizer-api branch 3 times, most recently from 552ac6d to f33dde9 Compare September 10, 2024 14:06
@github-actions github-actions bot added the libarchfpga Library for handling FPGA Architecture descriptions label Sep 11, 2024
@AlexandreSinger AlexandreSinger force-pushed the feature-cluster-legalizer-api branch from 06c5ae0 to f33dde9 Compare September 11, 2024 14:13
Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! I have a bunch of suggestions, but they're all around commenting etc. not the actual code implementation.

Isolated the cluster legalization logic out of the packer's clustering
algorithm in such a way that it can be used externally to the clusterer
with no issue. Once the ClusterLegalizer API is constructed, it will
allocate all of the memory that it needs for legalization; it can then
be used to legalize clusters; and when it is destroyed it will remove
all state created for clustering.

The cluster legalizer API maintains the legalized clusters internally as
opposed to using the ClusterBlockId as an ID to the clusters. This
allowed me to separate the ClusteredNetlist from the packing algorithm
entirely. This importantly helper separate out all legalization logic
from directly modifying the ClusteredNetlist generation code.

I did my best to remove as much global state as I could from the API to
make it as self-contained as possible.

This change greatly cleaned up the packer code and removed many of its
global variables (including the ClusterLegalizer itself, which is now a
local variable!)

This change also removed the re_clustering API since it would need to be
heavily modified to include this change; and since this feature was not
being tested, it would be challenging to upgrade to use this new cluster
legalization API.
Found that one of the pb_stats members is required for cluster
legalization; but was being calculated outside of the cluster legalizer.

Moved this in to allow the cluster legalizer to be used outside of the
clusterer.

Also found an issue where two clusters of the same type cannot be
constructed at the same time. Tried fixing it, but it produces different
results (different clusters). Will raise in a separate PR, left as a
fixme comment.
Added more documentation. Cleaned up one set which should have been a
vector.
@AlexandreSinger AlexandreSinger force-pushed the feature-cluster-legalizer-api branch from f3528a5 to ba88f64 Compare September 16, 2024 03:51
@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz Thank you so much for all of your comments. I really appreciate them. I have resolved most of them. I want to discuss with you forward-declaring the static methods; I personally do not like doing that, but its not a hill I will die on.

I am running nightly_test 1, 2, 3, and 4 tonight and will run 5, 6, and 7 tomorrow (they should be a bit quicker). I will let you know the results of these runs. Since this should be functionally equivalent, I do not expect golden results to change.

After this PR is merged, I plan to raise another PR fixing a limitation of this API. I will also raise a few issues based on your comments above which I did not resolve in this PR.

@vaughnbetz
Copy link
Contributor

Thanks Alex. I'm not fanatical about declaring static routines at the top of the file (although I do prefer it) so happy to discuss.

@vaughnbetz vaughnbetz merged commit 09d498f into verilog-to-routing:master Sep 16, 2024
36 of 52 checks passed
@AlexandreSinger AlexandreSinger deleted the feature-cluster-legalizer-api branch September 16, 2024 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code libarchfpga Library for handling FPGA Architecture descriptions VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants