-
Notifications
You must be signed in to change notification settings - Fork 414
[AP][InitialPlacement] Created Isolated AP Flow #2988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AP][InitialPlacement] Created Isolated AP Flow #2988
Conversation
f57750e
to
ce2b5a0
Compare
@amin1377 FYI |
Titan results are in:
This change had very little change on the titan results; however more atoms appear to be placed where they wanted to go according to the global placement. Looking through the raw data, I do notice a lot of outliers which are bringing up a lot of these numbers. I think this may be related to the mass legalizer not exactly knowing how much can be put into different clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean code! One comment on fixed IOs / placement constraints embedded.
float variance = get_flat_variance(pl_macro, flat_placement_info); | ||
float std_dev = std::sqrt(variance); | ||
// Normalize the standard deviation to be a number between 0 and 1. | ||
float normalized_std_dev = std_dev / (std_dev + 1.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function will have relatively little difference for most std_dev values; it has a reasonable difference for small standard deviations (e.g. 1 maps to .5 and 2 to .67, but not much for larger ones (50 maps to .98 and 100 to .99). Is that what you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was somewhat intentional. I tried a few different normalization functions and found this one to work well. My intuition behind it is that we want variances close to 0 to be placed first; anything beyond around 5 have such a high variance that it implies that the atoms within do not know where they want to go.
The goal of this cost term is to ensure that clusters with variance 0 get placed first. Eventually I would like to add some mass information so we try to place more "massive" clusters first (clusters with the most pins / number of RAM bits).
|
||
// Finally, fix the IO blocks if the user specified the option to do | ||
// so. | ||
fix_IO_block_types(pl_macro, centroid_loc, pad_loc_type, blk_loc_registry.mutable_block_locs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd beef up this comment a bit more ... if the user asked for a random pad location and this is an IO block, lock down the macro at this location so the placer can't move it.
I also think this code isn't quite right -- locking the IOs is done to test how well you work with randomly locked IOs, or IOs locked to specific locations by board level constraints and expressed with a pad_file that locks them down. This code seems to put the IOs where it wants, and then lock them down (unless the flat placement already put them in the right spot). If the latter, that should be commented. If the former, there should be a TODO to fix this eventually; it is a form of placement constraint, so supporting placement constraints should eventually fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Vaughn, I have updated the comment to beef it up.
The current implementation of the non-AP initial placer does this; whenever it find a legal site to place a macro it tries to lock them down using this method (the method checks if the option was passed in). This matches what the fix-pins option expects:
The IO pads are fixed to "arbitrary" locations (when the IOs are not fixed, AP GP will place them basically anywhere it wants). However, I see what you mean since this is not truly "random" since AP is choosing good sites for these blocks. I added a TODO to investigate this.
Hi @AlexandreSinger, I have a quick question. In this part of the code, and elsewhere in AP, when block ordering is needed, you always sort based on the size of the macro. That makes perfect sense for the ASIC flow. However, for FPGA placement, I wonder if it might be better to sort based on the number of pins. I realize there's likely a high correlation between macro size and pin count, but making the sorting criteria explicitly based on pins might result in a better QoR. |
Hi @amin1377 , I am not sure what you mean by "elsewhere in AP", this is the only place in the AP flow that I sort by the size of the macro as far as I am aware. Where else do I do this? Regarding my use of it here; it has to do with finding a legal placement. Suppose we had a macro with 2 clusters in it. This macro will be harder to place than just a single cluster since it would need to find two open, legal clusters right next to each other (which gets harder and harder to find as the macro size increases). If we place these large macros too late, they may never find a legal site to be placed into. That was the intuition of sorting based on macro size; it was not necessarily for finding better quality solutions. I am currently working on the mass abstraction; I think sorting based on the mass of all blocks within each macro would achieve what you are describing. We could add this to the cost function, but that would have to come in later though. |
The old Initial Placer used in the AP flow was constructed within the initial placer of the non-AP flow. This forced the AP flow to try to place blocks one at a time with minimum displacement. This is non-ideal since blocks that were placed earlier were being getting first picks at locations, which may displace a future cluster which may be a better fit for that location. Separated out the AP initial placement code. For AP, initial placement is done in passes. The first pass will try to place clusters exactly at the tile that the centroid of all atoms within the cluster want to be placed (according to the global placement). Any clusters that could not be placed are reserved for the next pass. The second pass will allow clusters to be placed within 1 tile of their centroid. All subsequent passes will allow cluster to be placed exponentially farther from their centroid. The initial placement terminates when all clusters have been placed or if the max displacement is the size of the entire device. The clusters are sorted based on the size of the macro that contains them and the variance of the placement of the atoms within the macro. This allows large macro blocks with low variance to be placed first.
ce2b5a0
to
dfa1bd3
Compare
The old Initial Placer used in the AP flow was constructed within the initial placer of the non-AP flow. This forced the AP flow to try to place blocks one at a time with minimum displacement. This is non-ideal since blocks that were placed earlier were being getting first picks at locations, which may displace a future cluster which may be a better fit for that location.
Separated out the AP initial placement code. For AP, initial placement is done in passes.
The first pass will try to place clusters exactly at the tile that the centroid of all atoms within the cluster want to be placed (according to the global placement). Any clusters that could not be placed are reserved for the next pass.
The second pass will allow clusters to be placed within 1 tile of their centroid.
All subsequent passes will allow cluster to be placed exponentially farther from their centroid.
The initial placement terminates when all clusters have been placed or if the max displacement is the size of the entire device.
The clusters are sorted based on the size of the macro that contains them and the variance of the placement of the atoms within the macro. This allows large macro blocks with low variance to be placed first.
Results on the largest VTR circuits (fixed IOs):
This improved the initial placement quality by around 5%, the amount of atom errors (where atoms are placed in a tile they do not want to be placed in according to the global placement) went down by around 8%. Atom displacement improved by 5%. The max atom displacement got worst by around 5%. I think this increase in max displacement is ok. I will collect Titan results to verify this.