Skip to content

vtr_reg_nightly run failures #1711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sfkhalid opened this issue Apr 19, 2021 · 16 comments
Closed

vtr_reg_nightly run failures #1711

sfkhalid opened this issue Apr 19, 2021 · 16 comments
Assignees
Labels

Comments

@sfkhalid
Copy link
Contributor

vtr_reg_nightly is currently having six run failures.

Expected Behaviour

All of the runs should pass.

Current Behaviour

Six runs are failing.

The runs that are failing are all run with similar command line arguments. Many command line arguments are specified, including --fix_clusters and --sdc_file.

The failures can be seen in the following log file:
log_with_nightly_failures.log

Possible Solution

  1. The failures could possibly be occurring because the files that are specified with the vpr command line options are not in the specified directories. For example, the command specifies a file /vtr_flow/benchmarks/symbiflow/sdc/picosoc_basys3_full_100.sdc, but I do not see this file in the directory when I check the repo on github.

  2. The failure could also be coming from the move generators moving blocks that are marked as fixed. Only one of the move generators (feasible region move generator) has a check to see whether the block about to be moved is fixed. The check can be seen here:

    if (place_ctx.block_locs[b_from].is_fixed) {
    return e_create_move::ABORT; //Block is fixed, cannot move
    }

Even if the failures are not coming from the second issue, this check should be added to all the move generators.

Steps to Reproduce

Run vtr_reg_nightly CI

@sfkhalid
Copy link
Contributor Author

@acomodi Are the sdc files used in the commands mentioned in point 1 coming in from the symbiflow repo? I don't see the sdc files in the directory /vtr_flow/benchmarks/symbiflow/sdc/ on github and I was thinking that may be the reason for the vtr_reg_nightly failures

@sfkhalid
Copy link
Contributor Author

@MohamedElgammal Could you add in the necessary checks in the move generators as mentioned in point 2? This may be the source of the vtr_reg_nightly failures, but also the .is_fixed check should be added to each move generator anyway.

@vaughnbetz
Copy link
Contributor

@kgugala can you take a look? As you noted in the meeting, there is an error embedded in a warning (which is itself a bug I think), and it flags some issue with the arch.xml.

@MohamedElgammal
Copy link
Contributor

@sfkhalid The checks are done in the pick_from_block() function that is used to select the moving block for all moves except the feasible_region and the critical_unifrom moves. That's why we added the check to both of them explicitly.
The to_block_is also checked in is_legal_swap_to_location() function.

@acomodi
Copy link
Collaborator

acomodi commented Apr 23, 2021

Are the sdc files used in the commands mentioned in point 1 coming in from the symbiflow repo?

@sfkhalid the SDC alongside with the synthesized circuit are coming from artifacts generated by the symbiflow-repo. See the download_symbiflow script for more info.

I am looking into the issue in the symbiflow side at the current VTR head, and there is indeed a problem. I am seeing a segfault during intial placement. I'll investigate further to see what is the cause of this.

@vaughnbetz
Copy link
Contributor

@acomodi @sfkhalid : Sarah was recently modifying that file, so Sarah if you're able to run this locally it is probably worth you taking look too. And if you don't know how to run it locally hopefully @acomodi can point you at documentation (and if there isn't any, we should put it on the developer guide somewhere).

@sfkhalid
Copy link
Contributor Author

@acomodi @sfkhalid : Sarah was recently modifying that file, so Sarah if you're able to run this locally it is probably worth you taking look too. And if you don't know how to run it locally hopefully @acomodi can point you at documentation (and if there isn't any, we should put it on the developer guide somewhere).

Sure, I'll take a look into it.

@sfkhalid
Copy link
Contributor Author

Are the sdc files used in the commands mentioned in point 1 coming in from the symbiflow repo?

@sfkhalid the SDC alongside with the synthesized circuit are coming from artifacts generated by the symbiflow-repo. See the download_symbiflow script for more info.

I am looking into the issue in the symbiflow side at the current VTR head, and there is indeed a problem. I am seeing a segfault during intial placement. I'll investigate further to see what is the cause of this.

@acomodi When is the seg fault taking place? Is it when you run the commands that are failing in the above log file?

@acomodi
Copy link
Collaborator

acomodi commented Apr 26, 2021

@acomodi When is the seg fault taking place? Is it when you run the commands that are failing in the above log file?

I have tried to use one of the latest failing VTR versions in symbiflow-arch-defs. I am currently trying to run the symbiflow VtR nightly tests to see if I get the same segfault, or at least reproduce the issues seen in CI.

@acomodi
Copy link
Collaborator

acomodi commented Apr 26, 2021

@sfkhalid By further debugging the issue, I found out the following:

  1. The VTR nightly issue is not related to the fix in place: fix case in which PR region does not have constraints #1713. What we can see from the failing logs is the following:
Error 1: arch.timing.xml:-1 <pb_type> 'IPAD_GTP' timing-annotation/<model> mismatch on port 'I' of model 'IPAD_GTP_VPR', input port 'I' has combinational connections to port 'O'; specified in model, but no combinational delays found on pb_type

The problem here is that the IPAD_GTP_VPR model does not have any pin with a combinational connection, therefore this should rather be a warning, and I am not sure why this is happening. Furthermore, I could not reproduce this error locally, with the same symbiflow xc7a50t architecture version used in the failing nightly CI.

  1. The initial placement issue is happening any time VPR is invoked with separate commands. In SymbiFlow, VPR is called at each stage: packing, placement and routing are three separate calls. I think that the initial placement issue happens because the partition region initialization might happen only when the packer is invoked, which is not the case when the placement and routing steps are invoked. To reproduce the issue you may a random strong regression test and then invoke vpr once again with the --place argument (provided that a .net file was produced by the first run).

@acomodi
Copy link
Collaborator

acomodi commented Apr 26, 2021

@sfkhalid I have opened this PR that adds a strong_place regression test that runs against a pre-computed .net file. This test can be used to trigger the segfault

@vaughnbetz
Copy link
Contributor

Thanks Alessandro. That hypothesis makes sense -- Sarah, I don't recall seeing any separate pack, then place tests. In addition to fixing the bug we should fix that testing hole with some more separate pack then place reg tests as this is pushed.

@sfkhalid
Copy link
Contributor Author

Thanks Alessandro. What you are saying makes sense, if the pack stage wasn't run it could cause those issues in the placement stage.

Vaughn, yes, in terms of fixing the bug I think the cluster constraints data structure should be initialized independent of the pack stage. I will add tests to check this as I go along.

@vaughnbetz
Copy link
Contributor

@acomodi believes this is fixed now; was a CI download from symbiflow issue. @sfkhalid please update this if the new CI passes (can close).

@sfkhalid
Copy link
Contributor Author

sfkhalid commented May 9, 2021

The CI passed all the tests including nightly on #1704, this issue can now be closed.

Copy link

github-actions bot commented May 9, 2025

This issue has been inactive for a year and has been marked as stale. It will be closed in 15 days if it continues to be stale. If you believe this is still an issue, please add a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants