-
Notifications
You must be signed in to change notification settings - Fork 414
[ClusterLegalizer] Cluster Legalizer API #2709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ClusterLegalizer] Cluster Legalizer API #2709
Conversation
ea77b2d
to
76564d2
Compare
This PR is still WIP since I have some small cleanups and documentation left; however, I wanted to get an idea on the test coverage. We are passing all of the regular tests and the NightlyTests are currently down, so I am just trying the VTR benchmarks for now. Below is the current results on the VTR benchmarks:
The routed wirelength, min channel width, and the CPD are all exactly the same (which means this API upgrade is functionally equivalent) which is very good news. The pack time did not change much, but the max memory appears to have increased by 22%. @vaughnbetz This increase in max memory is what we discussed in our last meeting. This is likely caused by the old clusterer algorithm freeing all the intermediate clustering data after each cluster is created. The current design keeps all the intermediate clustering data (such as the intra-lb routing data) so that clusters can be modified later (even though it isnt). I think if we use your idea with a "clean cluster data" method, we may be able to bring down this max memory. Do you think that is necessary? It would be a "weird" interface since it implies that the cluster cannot be modified by the legalizer later if it was cleaned. |
It would be good to get the memory increase on the largest Titan designs. A 20% or so memory increase isn't catastrophic, but it is noticable so it would be good to put the clustering data structures on a diet if this increase affects the biggest designs. |
@vaughnbetz I implemented what we discussed (a clean method). While looking into the data structures for the router data and (especially) the pb stats, I noticed that they take up a large amount of space. It makes sense to clean them when the user knows they will no longer be modifying a cluster. This problem is mainly caused by the clusterer storing data within the pb type itself, hopefully someday we can separate this out! With this change, the max memory for the VTR benchmarks is now slightly better than it was before!
Here is the raw data in case you were curious. For the VTR circuits, there appears to be no change in quality, runtime, or memory usage and now the API is significantly easier to work with. I still have some small changes to make before I remove the WIP status, but I expect it to be done by tomorrow. We should wait for the NightlyTests though to merge this in. I am still on the ropes about what to do about the reclustering API; I will continue to think on it. |
Great, thanks @AlexandreSinger |
d4c0df2
to
97bf1ba
Compare
@vaughnbetz I have finalized the PR and removed the WIP label. The API is now ready for review. I have decided to completely remove the re_clustering API since, while trying to upgrade it to use the new legalizer API, I found that most of the code would need to be either removed or completely re-thought. Since this API contains no CI tests, it would be challenging to upgrade it; especially since it would require me to test it in both a placement context (which the placer is currently not setup to handle performing cluster legalization) and a packing context (which is now trivial since the re_clustering API would literally just be a wrapper to this new legalizer). The code will still exist in the Git history if someone wants to bring the API back; but I do believe that a complete rewrite is in order to bring this feature fully in (but it would be much easier to rewrite than prior). Please let me know what you think about this. I will run the Titan circuits this afternoon and see how this API performs. I expect the runtime, memory usage, and quality to be very similar. |
Here are the results on the Titan circuits:
QoR is identical, pack time is practically the same, max_vpr_mem is down slightly. I also pulled out the pack memory to directly see how the memory of the packer was affected by these code improvements:
For some of the largest circuits, this amounts to around a gigabyte of memory savings. I think even more can be saved if the data structures were organized a bit better. Here is the raw data if you are curious: |
Great, thanks! Will review soon (maybe tonight after the conference). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments to suggest minor improvements.
@AlexandreSinger, cluster_util.cpp
was too large and with no clean structure. Your changes have significantly improved the readablity.
552ac6d
to
f33dde9
Compare
06c5ae0
to
f33dde9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks! I have a bunch of suggestions, but they're all around commenting etc. not the actual code implementation.
Isolated the cluster legalization logic out of the packer's clustering algorithm in such a way that it can be used externally to the clusterer with no issue. Once the ClusterLegalizer API is constructed, it will allocate all of the memory that it needs for legalization; it can then be used to legalize clusters; and when it is destroyed it will remove all state created for clustering. The cluster legalizer API maintains the legalized clusters internally as opposed to using the ClusterBlockId as an ID to the clusters. This allowed me to separate the ClusteredNetlist from the packing algorithm entirely. This importantly helper separate out all legalization logic from directly modifying the ClusteredNetlist generation code. I did my best to remove as much global state as I could from the API to make it as self-contained as possible. This change greatly cleaned up the packer code and removed many of its global variables (including the ClusterLegalizer itself, which is now a local variable!) This change also removed the re_clustering API since it would need to be heavily modified to include this change; and since this feature was not being tested, it would be challenging to upgrade to use this new cluster legalization API.
Found that one of the pb_stats members is required for cluster legalization; but was being calculated outside of the cluster legalizer. Moved this in to allow the cluster legalizer to be used outside of the clusterer. Also found an issue where two clusters of the same type cannot be constructed at the same time. Tried fixing it, but it produces different results (different clusters). Will raise in a separate PR, left as a fixme comment.
Added more documentation. Cleaned up one set which should have been a vector.
f3528a5
to
ba88f64
Compare
@vaughnbetz Thank you so much for all of your comments. I really appreciate them. I have resolved most of them. I want to discuss with you forward-declaring the static methods; I personally do not like doing that, but its not a hill I will die on. I am running nightly_test 1, 2, 3, and 4 tonight and will run 5, 6, and 7 tomorrow (they should be a bit quicker). I will let you know the results of these runs. Since this should be functionally equivalent, I do not expect golden results to change. After this PR is merged, I plan to raise another PR fixing a limitation of this API. I will also raise a few issues based on your comments above which I did not resolve in this PR. |
Thanks Alex. I'm not fanatical about declaring static routines at the top of the file (although I do prefer it) so happy to discuss. |
The goal of this PR is to clean up the cluster legalization code in the packer (the code which creates the clusters and ensures that they are legal) into a clean API which can be called in an Analytical Placement flow which will be called externally to the Packer. In fact, the packer will never be called; so truly it would "replace" the packer.
Isolated the cluster legalization logic out of the packer's clustering algorithm in such a way that it can be used externally to the clusterer with no issue. Once the ClusterLegalizer API is constructed, it will allocate all of the memory that it needs for legalization; it can then be used to legalize clusters; and when it is destroyed it will remove all state created for clustering.
The cluster legalizer API maintains the legalized clusters internally as opposed to using the ClusterBlockId as an ID to the clusters. This allowed me to separate the ClusteredNetlist from the packing algorithm entirely. This importantly helped separate out all legalization logic from directly modifying the ClusteredNetlist generation code.
I did my best to remove as much global state as I could from the API to make it as self-contained as possible.
This change greatly cleaned up the packer code and removed many of its global variables (including the ClusterLegalizer itself, which is now a local variable!)