Skip to content

Added cleaning of a pb after unsuccessful molecule packing #1656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 25, 2021

Conversation

mkurc-ant
Copy link
Collaborator

Attempt to fix #1655

@github-actions github-actions bot added the VPR VPR FPGA Placement & Routing Tool label Feb 3, 2021
@mkurc-ant mkurc-ant force-pushed the cluster_fix branch 2 times, most recently from 369f5d1 to 5c3070c Compare February 3, 2021 17:19
Copy link
Collaborator

@litghost litghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test.

@mkurc-ant
Copy link
Collaborator Author

This needs a test.

@litghost Test added.

@mkurc-ant
Copy link
Collaborator Author

I've run vtr_reg_nightly on the same machine. Both for master (@53780c0b9) and this PR. There was no difference in any of the QoR metrics except runtime (this was expected).

metric baseline modified (this PR)
total_runtime 1.000 1.010
pack_time 1.000 1.013

Copy link
Collaborator

@litghost litghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, waiting for other reviews.

Copy link
Collaborator

@acomodi acomodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we need to re-run CI as the Nightly one errored out for some reason (link).

@@ -1179,6 +1179,64 @@ static void alloc_and_load_pb_stats(t_pb* pb, const int feasible_block_array_siz
}
/*****************************************/

/**
* Cleans up a pb after unsuccessful molecule packing
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to comment this routine and how it works as much as you can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is it cleaning up? When is it safe to call?

@vaughnbetz
Copy link
Contributor

LGTM too. My one request is to comment it as much as you can. I realize that may be limited if you don't have full understanding of exactly how / why these structures are set up, but if you give as much insight as you can into why this routine is being called, what it does and when it is safe to call it that would be good.

@@ -1358,6 +1416,10 @@ static enum e_block_pack_status try_pack_molecule(t_cluster_placement_stats* clu
revert_place_atom_block(molecule->atom_block_ids[i], router_data, atom_molecules);
}
}

/* Placement failed, clean the pb */
Copy link
Contributor

@sfkhalid sfkhalid Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, is the cleanup_pb function supposed to be called when a molecule fails try_pack_molecule for any reason? Or only when it fails during atom placement (try_place_atom_block_rec)? I’m asking because in the issue it says “Add (or fix?) cleanup code that removes child_pbs created during atom placement if it fails for any reason.”

If it is meant to be called when a molecule fails for any reason, it should also be called if the atom fails the detailed routing legality check in try_pack_molecule. (line 1305/1363)

If the cleanup function is only meant to be called if the atom fails during try_place_atom_block_rec , then maybe the call for cleanup_pb should be at the point right after try_place_atom_block_rec is called. So right after the atom placement attempt, you could have a check of the block_pack_status and call the cleanup function if it is needed. Right now, where the cleanup_pb function is called, it could have failed the placement check or the pin feasibility filter check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It should be called whenever try_place_atom_block_rec fails. But if you look at the loop that invokes it for each atom you'll see that it terminates immediately on failure:

for (i = 0; i < molecule_size && block_pack_status == BLK_PASSED; i++) {

Later in the flow there are calls to some atom placement reversal routines (

if (block_pack_status != BLK_PASSED) {
) executed on failure so I figured out that the pb cleanup should be invoked from there. It doesn't matter what is the reason of the atom placement failure. If no pbs were allocated the function simply won't do anything. BTW: it does not touch pbs with already placed atoms.

Diving into the low-level details: The problem I observed with the assertion failure in try_place_atom_block_rec was that for some reason the function was hitting already allocated pbs with their modes set causing a conflict with the mode required by the currently packed atom. Digging deeper I've discovered that the function is called for a pb_graph_node corresponding to that atom. Then it recurses up the hierarchy allocating pbs and setting their modes until the root CLB is reached. But these pbs are never freed between subsequent molecule packing. So if packing succeeds everything is fine but when it fails there are leftover pbs set to specific modes. This may cause mode mismatch if the mode required for the next molecule is different than the last one checked.

I typed "Add (or fix?)..." in the issue as I was surprised that the problem haven't manifested itself until now. I'm still not sure whether I'm adding the actually missing data structure cleanup or if the cleanup is already there and requires fixing and I've missed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, that makes sense. In that case LGTM.

@mkurc-ant
Copy link
Collaborator Author

I added more comments to the cleanup function that I've added.

@vaughnbetz
Copy link
Contributor

Looks good to me. Looks like it needs to be updated to match the base branch and then can be merged.
Also looks like vtr_nightly didn't run again. flagging that to @litghost

@mkurc-ant
Copy link
Collaborator Author

@vaughnbetz Updated.

@vaughnbetz
Copy link
Contributor

vtr_reg_nightly got stuck again, but based on the testing and QoR data I'm confident this is OK and we should merge it. @litghost: seems like this is another case of a stuck nightly reg.

@vaughnbetz vaughnbetz merged commit 8dcd60c into verilog-to-routing:master Feb 25, 2021
@mkurc-ant
Copy link
Collaborator Author

@vaughnbetz thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion 'parent_pb->mode == pb_graph_node->pb_type->parent_mode->index' failure
6 participants