Skip to content

[AP][InitialPlacement] Created Isolated AP Flow #2988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexandreSinger
Copy link
Contributor

@AlexandreSinger AlexandreSinger commented Apr 17, 2025

The old Initial Placer used in the AP flow was constructed within the initial placer of the non-AP flow. This forced the AP flow to try to place blocks one at a time with minimum displacement. This is non-ideal since blocks that were placed earlier were being getting first picks at locations, which may displace a future cluster which may be a better fit for that location.

Separated out the AP initial placement code. For AP, initial placement is done in passes.

The first pass will try to place clusters exactly at the tile that the centroid of all atoms within the cluster want to be placed (according to the global placement). Any clusters that could not be placed are reserved for the next pass.

The second pass will allow clusters to be placed within 1 tile of their centroid.

All subsequent passes will allow cluster to be placed exponentially farther from their centroid.

The initial placement terminates when all clusters have been placed or if the max displacement is the size of the entire device.

The clusters are sorted based on the size of the macro that contains them and the variance of the placement of the atoms within the macro. This allows large macro blocks with low variance to be placed first.

Results on the largest VTR circuits (fixed IOs):

Metric Change
Normalized Post FL WL 0.947
Normalized Post-Route WL 1.003
Normalized Atom Errors 0.922
Normalized Atom Displacement 0.950
Normalized Max Atom Displacement 1.047

This improved the initial placement quality by around 5%, the amount of atom errors (where atoms are placed in a tile they do not want to be placed in according to the global placement) went down by around 8%. Atom displacement improved by 5%. The max atom displacement got worst by around 5%. I think this increase in max displacement is ok. I will collect Titan results to verify this.

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Apr 17, 2025
@AlexandreSinger
Copy link
Contributor Author

The output status messages look like this:
Screenshot from 2025-04-17 18-33-58

@AlexandreSinger AlexandreSinger force-pushed the feature-ap-initial-placer branch from f57750e to ce2b5a0 Compare April 17, 2025 22:35
@AlexandreSinger
Copy link
Contributor Author

@amin1377 FYI

@AlexandreSinger
Copy link
Contributor Author

Titan results are in:

Metric Improvement over baseline
Post FL HPWL 1.0033
Post Route HPWL 0.9968
Atom Errors 0.9470
Average Atom Displacement 1.0527
Max Atom Displacement 1.0657

This change had very little change on the titan results; however more atoms appear to be placed where they wanted to go according to the global placement. Looking through the raw data, I do notice a lot of outliers which are bringing up a lot of these numbers. I think this may be related to the mass legalizer not exactly knowing how much can be put into different clusters.

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean code! One comment on fixed IOs / placement constraints embedded.

float variance = get_flat_variance(pl_macro, flat_placement_info);
float std_dev = std::sqrt(variance);
// Normalize the standard deviation to be a number between 0 and 1.
float normalized_std_dev = std_dev / (std_dev + 1.0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will have relatively little difference for most std_dev values; it has a reasonable difference for small standard deviations (e.g. 1 maps to .5 and 2 to .67, but not much for larger ones (50 maps to .98 and 100 to .99). Is that what you want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was somewhat intentional. I tried a few different normalization functions and found this one to work well. My intuition behind it is that we want variances close to 0 to be placed first; anything beyond around 5 have such a high variance that it implies that the atoms within do not know where they want to go.

The goal of this cost term is to ensure that clusters with variance 0 get placed first. Eventually I would like to add some mass information so we try to place more "massive" clusters first (clusters with the most pins / number of RAM bits).


// Finally, fix the IO blocks if the user specified the option to do
// so.
fix_IO_block_types(pl_macro, centroid_loc, pad_loc_type, blk_loc_registry.mutable_block_locs());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd beef up this comment a bit more ... if the user asked for a random pad location and this is an IO block, lock down the macro at this location so the placer can't move it.

I also think this code isn't quite right -- locking the IOs is done to test how well you work with randomly locked IOs, or IOs locked to specific locations by board level constraints and expressed with a pad_file that locks them down. This code seems to put the IOs where it wants, and then lock them down (unless the flat placement already put them in the right spot). If the latter, that should be commented. If the former, there should be a TODO to fix this eventually; it is a form of placement constraint, so supporting placement constraints should eventually fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Vaughn, I have updated the comment to beef it up.

The current implementation of the non-AP initial placer does this; whenever it find a legal site to place a macro it tries to lock them down using this method (the method checks if the option was passed in). This matches what the fix-pins option expects:
Screenshot from 2025-04-21 13-34-34

The IO pads are fixed to "arbitrary" locations (when the IOs are not fixed, AP GP will place them basically anywhere it wants). However, I see what you mean since this is not truly "random" since AP is choosing good sites for these blocks. I added a TODO to investigate this.

@amin1377
Copy link
Contributor

Hi @AlexandreSinger,

I have a quick question. In this part of the code, and elsewhere in AP, when block ordering is needed, you always sort based on the size of the macro. That makes perfect sense for the ASIC flow. However, for FPGA placement, I wonder if it might be better to sort based on the number of pins.

I realize there's likely a high correlation between macro size and pin count, but making the sorting criteria explicitly based on pins might result in a better QoR.

@AlexandreSinger
Copy link
Contributor Author

Hi @AlexandreSinger,

I have a quick question. In this part of the code, and elsewhere in AP, when block ordering is needed, you always sort based on the size of the macro. That makes perfect sense for the ASIC flow. However, for FPGA placement, I wonder if it might be better to sort based on the number of pins.

I realize there's likely a high correlation between macro size and pin count, but making the sorting criteria explicitly based on pins might result in a better QoR.

Hi @amin1377 , I am not sure what you mean by "elsewhere in AP", this is the only place in the AP flow that I sort by the size of the macro as far as I am aware. Where else do I do this?

Regarding my use of it here; it has to do with finding a legal placement. Suppose we had a macro with 2 clusters in it. This macro will be harder to place than just a single cluster since it would need to find two open, legal clusters right next to each other (which gets harder and harder to find as the macro size increases). If we place these large macros too late, they may never find a legal site to be placed into. That was the intuition of sorting based on macro size; it was not necessarily for finding better quality solutions.

I am currently working on the mass abstraction; I think sorting based on the mass of all blocks within each macro would achieve what you are describing. We could add this to the cost function, but that would have to come in later though.

The old Initial Placer used in the AP flow was constructed within the
initial placer of the non-AP flow. This forced the AP flow to try to
place blocks one at a time with minimum displacement. This is non-ideal
since blocks that were placed earlier were being getting first picks at
locations, which may displace a future cluster which may be a better fit
for that location.

Separated out the AP initial placement code. For AP, initial placement
is done in passes.

The first pass will try to place clusters exactly at the tile that the
centroid of all atoms within the cluster want to be placed (according to
the global placement). Any clusters that could not be placed are
reserved for the next pass.

The second pass will allow clusters to be placed within 1 tile of their
centroid.

All subsequent passes will allow cluster to be placed exponentially
farther from their centroid.

The initial placement terminates when all clusters have been placed or
if the max displacement is the size of the entire device.

The clusters are sorted based on the size of the macro that contains
them and the variance of the placement of the atoms within the macro.
This allows large macro blocks with low variance to be placed first.
@AlexandreSinger AlexandreSinger force-pushed the feature-ap-initial-placer branch from ce2b5a0 to dfa1bd3 Compare April 21, 2025 17:44
@AlexandreSinger AlexandreSinger merged commit 3663572 into verilog-to-routing:master Apr 21, 2025
35 checks passed
@AlexandreSinger AlexandreSinger deleted the feature-ap-initial-placer branch April 21, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants