Skip to content

[AP] Full Legalizer #2752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexandreSinger
Copy link
Contributor

The Full Legalizer in the AP flow takes a partial placement (of any legality) and returns a fully legal placement. A legal placement is a set of legal clusterers of atoms and legal places for those clusters to be placed.

To fully test the AP flow, made a temporary Global Placer which will just place the APBlocks at the center of the device. This will be fixed later.

Also includes basic tests to ensure the AP flow does not regress.

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Sep 27, 2024
@AlexandreSinger
Copy link
Contributor Author

AlexandreSinger commented Sep 27, 2024

@vaughnbetz Do you know what this error means? This is showing up in my ch_intrinsics basic test for the AP flow when the highest level of assertions are on.
Screenshot from 2024-09-27 15-44-59

The error is coming from an ASSERT debug message here (after routing when evaluating the slack):

VTR_ASSERT_DEBUG_MSG(verify_max_req_worst_slack(timing_graph, analyzer), "Calculated max required times and worst slacks should match those computed from scratch");

Which is defined here:

bool SetupSlackCrit::verify_max_req_worst_slack(const tatum::TimingGraph& timing_graph, const tatum::SetupTimingAnalyzer& analyzer) {
auto calc_max_req = max_req_;
auto calc_max_req_node = max_req_node_;
auto calc_worst_slack = worst_slack_;
auto calc_worst_slack_node = worst_slack_node_;
recompute_max_req_and_worst_slack(timing_graph, analyzer);
if (calc_max_req != max_req_) {
VPR_ERROR(VPR_ERROR_TIMING,
"Calculated max required times does not match value calculated from scratch");
return false;
}
if (calc_max_req_node != max_req_node_) {
VPR_ERROR(VPR_ERROR_TIMING,
"Calculated max required nodes does not match value calculated from scratch");
return false;
}
if (calc_worst_slack != worst_slack_) {
VPR_ERROR(VPR_ERROR_TIMING,
"Calculated worst slack does not match value calculated from scratch");
return false;
}
if (calc_worst_slack_node != worst_slack_node_) {
VPR_ERROR(VPR_ERROR_TIMING,
"Calculated worst slack nodes does not match value calculated from scratch");
return false;
}
return true;
}

My theory is that the clustering from this test is so bad that its causing weird errors to pop up. I am thinking I can just remove this testcase and add it back once the Global Placer is fully fleshed out.

* @brief Initializes the grid to empty. It also initialized the location for
* all blocks to unplaced.
*
* @param blk_loc_registry Placement block location information. To be filled with the location
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of date comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved this comment; but I think it is correct?

pass_requirements_file=pass_requirements_ap.txt

# Script parameters
script_params_common=-track_memory_usage --analytical_place --route --device "unnamed_device" --read_vpr_constraints ../../../../diffeq1_fixed_io.xml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will age better if you make a "constraints_dir= ..." line and refer to it (as we do for the circuits etc.)

vtr::vector<DeviceTileId, std::vector<APBlockId>> blocks_in_tiles(num_device_tiles);
for (APBlockId ap_blk_id : ap_netlist_.blocks()) {
// FIXME: Add these conversions to the PartialPlacement class.
size_t blk_x_loc = std::floor(p_placement.block_x_locs[ap_blk_id]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to clip to the device grid to avoid a seg fault.
May want to round to nearest instead of using floor () (left / bottom bias?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also make a good helper function.

mol_list.push_back(mol);
}
// Clustering algorithm: Create clusters one at a time.
while (!mol_list.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be another helper function.


// PLACING:
// Create a lookup from the AtomBlockId to the APBlockId
vtr::vector<AtomBlockId, APBlockId> atom_to_ap_block(atom_netlist_.blocks().size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this a helper function

// FIXME: Allocate and load moveable blocks?
// - This may be needed to perform SA. Not needed right now.

// FIXME: Check initial placement legality?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should call a legality checker right away.

@vaughnbetz
Copy link
Contributor

The error comes from:

"Calculated worst slack nodes does not match value calculated from scratch");

It appears to be complaining that the incremental timing analysis & full timing analysis do not agree on the worst case slack. To use incremental timing analysis you have to mark the edges of the timing graph / connections in the netlist whose delay has changed as "dirty". If you didn't do that and moved something around, you can't call incremental timing analysis (but can call the full recompute from scratch).

@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz The issue I am running into with this slack evaluation is that I have not even touched the timing graph... Whats worst is that this error comes up on the second routing attempt (since it is doing a min channel width sweep). It looks like this error is actually in routing and that my packing & placement just reveals the issue. I think there may be a small issue with how the incremental timing is being performed in the router. The reason this may not have been caught is that this is a debug check and there are only 4 small testcases which have this check turned on (vtr_reg_basic with timing on). I am not too sure how to proceed on this issue beyond turning off this test for now and raising an issue. My guess is something in the router is not marking a change in the routing graph as dirty, but I am not sure where to look!

@vaughnbetz
Copy link
Contributor

OK, that sounds fine. You can turn off the test for now and raise an issue. Please tag Fahri on it ... he's the most likely one to figure it out. Maybe your clustering is so different you show a mistake that usually doesn't show up, as you suggest. Not sure how else you could impact the router. (The timing engine does make callbacks to figure out delays, so if you left out some clustering info that would be a possibility -- maybe it is doing invalid look ups? But it could definitely also be a latent bug that you've exposed).

@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz Sounds good. I will raise an issue on it. I do not know why I did not think of this before, but I generated the .net and .place files from the AP flow and then fed those into VPR independently of the AP flow and was able to replicate the issue. This shows that it has nothing to do with the AP flow and this is just a really hidden bug that is exposed by my really crappy clustering and placement.

I'll include all of these generated files and the commands to replicate this in my issue.

@AlexandreSinger
Copy link
Contributor Author

AlexandreSinger commented Sep 29, 2024

To show how hidden this bug is (and how unlucky I was). This command has the issue:

vpr fixed_k6_frac_N8_22nm.xml ch_intrinsics --circuit_file ch_intrinsics.pre-vpr.blif --net_file ch_intrinsics.net --place_file ch_intrinsics.place --route --device unnamed_device --read_vpr_constraints ../../../../ch_intrinsics_fixed_io.xml --min_route_chan_width_hint 40

And this command does not:

vpr fixed_k6_frac_N8_22nm.xml ch_intrinsics --circuit_file ch_intrinsics.pre-vpr.blif --net_file ch_intrinsics.net --place_file ch_intrinsics.place --route --device unnamed_device --read_vpr_constraints ../../../../ch_intrinsics_fixed_io.xml

Just removing the min channel width hint causes the routing problem to change slightly and remove this issue!

@vaughnbetz
Copy link
Contributor

Weird. I don't suppose there's any chance we forgot to re-timing analyze from scratch after restoring the routing or anything like that (which might be more directly related to the hint)?

@AlexandreSinger AlexandreSinger force-pushed the feature-ap-full-legalizer-upstreaming branch from 500368c to a21ddbb Compare September 29, 2024 15:18
@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz I have resolved most of your comments:

  • I have move count models into the Cluster Legalizer (this just made sense)
  • I have split the full legalizer into two methods (Clustering and Cluster Placement)
  • Added a method to get the containing physical tile loc from the partial placement position of an APBlock
  • Added some code to print the .place file after the AP flow (I just forgot to do that)
  • I turned off that failing testcase. Will raise an issue soon.

I have purposfully left large parts of the full legalizer un-outlined since I plan to return to it very soon to further clean it up and completely restructure it. I just wanted to push something that is working asap. For example, the verification function from the initial placer was seg faulting due to some data structures the placer used not being initialized (but I do not need the structures at all). My goal is the properly unify my cluster placement with the initial placer.

Regarding the "constraint_dir=" comment, I completely agree with you; but I am avoiding modifying the scripting infrastructure while the CI is down. I also want to do this in its own PR.

Regarding which tile to place the given APBlock, I disagree that it should be the nearest. At the Full Legalizer level, the APBlocks will be hovering over the tile they want to be in (even if it is at position 0.9,0.9 its still in the 0th tile). The issue with VTR is that we uniquely identify tiles by their top-left corner which is super confusing. In reality, I imagine each APBlock by being placed at the center of the tile. My approach does break-down when the tiles are not 1x1; however, this is the job of the partial legalizer to resolve (at least in the current design). Eventually I would like to make the cluster placer try to place cluster blocks to the tile whose center is closest to the centroid location of all blocks in the cluster; but this is a bit down the road. Right now, I went for simplicity. The partial placement is assumed to be valid (verify passed, which ensures all blocks are on the device) and all blocks want to be in whatever tile they are hovering above.

@AlexandreSinger
Copy link
Contributor Author

Issue has been raised: issue #2754

@vaughnbetz
Copy link
Contributor

Ok thanks for the suggestion summary. Isn't the anchor point of blocks their bottom left corner though?

@AlexandreSinger
Copy link
Contributor Author

AlexandreSinger commented Sep 29, 2024

Sorry, yes. My mind is in graphics land, where I consider (0,0) to be the top left corner. The anchor-points of all tiles is the floor of all positions in it. So (0.5, 0.5) is the tile identified by (0,0). TBH mathematically it does not matter if the blocks are placed at the center or corner of tiles, since it would just add a constant offset to all blocks which will have no affect on the HPWL calculation, but it just makes more logical sense; since, as you pointed out, it moves an AP block from (0.9, 0.9) all the way to (0,0) when (1,1) is technically smaller of a displacement.

Edit: My mindset is that APBlocks will snap to the center of each tile. So, using my example, (0.9, 0.9) would rather snap to (0.5, 0.5) rather than (1.5, 1.5); I just have to use (0,0) to identify the tile.

@vaughnbetz
Copy link
Contributor

Thanks ... please merge after you add the comment above and figure out CI failures are spurious etc.

The Full Legalizer in the AP flow takes a partial placement (of any
legality) and returns a fully legal placement. A legal placement is a
set of legal clusterers of atoms and legal places for those clusters to
be placed.

To fully test the AP flow, made a temporary Global Placer which will
just place the APBlocks at the center of the device. This will be fixed
later.

Also includes basic tests to ensure the AP flow does not regress.
@AlexandreSinger AlexandreSinger force-pushed the feature-ap-full-legalizer-upstreaming branch from a21ddbb to e7fc394 Compare October 2, 2024 00:03
@AlexandreSinger AlexandreSinger merged commit ea8695a into verilog-to-routing:master Oct 2, 2024
23 of 41 checks passed
@AlexandreSinger AlexandreSinger deleted the feature-ap-full-legalizer-upstreaming branch October 2, 2024 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants