Skip to content

vtr_reg_strong test failures #1459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Bill-hbrhbr opened this issue Jul 31, 2020 · 20 comments
Closed

vtr_reg_strong test failures #1459

Bill-hbrhbr opened this issue Jul 31, 2020 · 20 comments

Comments

@Bill-hbrhbr
Copy link
Contributor

Expected Behaviour

vtr_reg_strong should pass on the master branch

Current Behaviour

There are six failed tests.
target_pin_util: EArch.xml/styr.blif/common_--target_ext_pin_util_-0.1
target_pin_util: EArch.xml/styr.blif/common_--target_ext_pin_util_1.1
target_pin_util: EArch.xml/styr.blif/common_--target_ext_pin_util_io:0.1,0.1_clb:0.7_0.8,1.0_1.0
target_pin_util: EArch.xml/styr.blif/common_--target_ext_pin_util_io:0.1,0.1_clb:0.7_0.8,1.0_clb:1.0
clock_pll: k6_frac_N10_mem32K_40nm_clk_pll_invalid.xml/multiclock_buf.blif/common
pack_disable: k6_frac_N10_40nm_disable_packing.xml/mult_5x6.blif/common

Possible Solution

Maybe one of the recent commits broke the reg tests.

Steps to Reproduce

  1. Checkout the master branch
  2. ./run_reg_test.pl vtr_reg_strong

logfile:
vtr_reg_strong.txt

@litghost
Copy link
Collaborator

litghost commented Aug 4, 2020

Looks like there is a strong sanitized failure:

graphics_commands:        k6_N10_mem32K_40nm.xml/stereovision3.v/common         		Error: Executable vpr failed
	full command:  /usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png; --max_router_iterations 150
	returncode  :  23
	log file    :  vpr.out
failed: Executable vpr failed (took 8.07 seconds)

@dpbaines
Copy link
Contributor

dpbaines commented Aug 5, 2020

Someone just needs to enclose the graphics command portion:

set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png;

in quotation marks it looks like. Here should work, I don't have permissions for kokoro though.

/usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands "set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png;" --max_router_iterations 150

@litghost
Copy link
Collaborator

litghost commented Aug 5, 2020

Someone just needs to enclose the graphics command portion:

set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png;

in quotation marks it looks like. Here should work, I don't have permissions for kokoro though.

/usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands "set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png;" --max_router_iterations 150

The test is fully controlled by "run_reg_test.pl", etc. Given that the non-sanitized version doesn't fail, I don't believe your analysis of the failure is not correct. I assume that the output upon failure is not right, but the invocation has quotes in the right places.

Bottom line is that I don't believe that kokoro is related to the failure itself.

@sfkhalid
Copy link
Contributor

sfkhalid commented Aug 5, 2020

After running the tests on the master branch, it now seems like vtr_reg_basic and vtr_reg_strong are failing all tests. Has anyone else had this happen?

@litghost
Copy link
Collaborator

litghost commented Aug 5, 2020

After running the tests on the master branch, it now seems like vtr_reg_basic and vtr_reg_strong are failing all tests. Has anyone else had this happen?

I don't see that on the CI side of things:
https://travis-ci.com/github/verilog-to-routing/vtr-verilog-to-routing/jobs/368413853
https://travis-ci.com/github/verilog-to-routing/vtr-verilog-to-routing/jobs/368413851

Those are both green? Do you mean with or without sanitization?

@sfkhalid
Copy link
Contributor

sfkhalid commented Aug 5, 2020

After running the tests on the master branch, it now seems like vtr_reg_basic and vtr_reg_strong are failing all tests. Has anyone else had this happen?

I don't see that on the CI side of things:
https://travis-ci.com/github/verilog-to-routing/vtr-verilog-to-routing/jobs/368413853
https://travis-ci.com/github/verilog-to-routing/vtr-verilog-to-routing/jobs/368413851

Those are both green? Do you mean with or without sanitization?

Not on the CI side, I checkout out master on my local machine and ran the tests and every single test failed for vtr_reg_basic and vtr_reg_strong. That didn't happen yesterday when I ran those tests on master - just the six tests that Bill mentioned above failed.

@litghost
Copy link
Collaborator

litghost commented Aug 5, 2020

I just believe the problem in that case is on your machine. I just re-built at master and re-ran vtr_reg_basic and it passed. Given that it matches CI, that provides reasonable evidence that master is not that broken.

At this point, I would recommend you clean your build directory and rebuild. Also what git hash is your local repository at? The default repo branch is currently at d4ea405 and this is where I quickly ran the vtr_reg_basic test that did pass.

@sfkhalid
Copy link
Contributor

sfkhalid commented Aug 5, 2020

Ok, I'll try that. Actually, on the CI side the tests are all failing as well, on a branch I just pushed new commits to. I'll try what you said and try to investigate what's happening.

@dpbaines
Copy link
Contributor

dpbaines commented Aug 5, 2020

As for this issue, I agree, but it appears the config for graphics_commands on Kokoro isn't inline with what's in the config file here: https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/master/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_graphics_commands/config/config.txt

So it appears to be configured correctly on the Github side, but the CI side appears to be running a slightly different command:
/usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png; --max_router_iterations 150

Notice the addition of max_router_iterations argument. I fully could be wrong though!

@litghost
Copy link
Collaborator

litghost commented Aug 5, 2020

The max_router_iterations comes from run_vtr_flow:

vpr.add_argument(
"-crit_path_router_iterations",
type=int,
default=150,
help="Tells VPR the amount of iterations allowed to obtain the critical path.",
)

@vaughnbetz
Copy link
Contributor

I believe all 6 failures flagged by Bill are cases where we expect VPR to fail with an error (they test error checking). So the error may be due to the vtr_flow rewrite — are expected VPR errors being incorrectly flagged as CI failures?
@shadtorrie @jgoeders

@sfkhalid
Copy link
Contributor

sfkhalid commented Aug 6, 2020

I believe all 6 failures flagged by Bill are cases where we expect VPR to fail with an error (they test error checking). So the error may be due to the vtr_flow rewrite — are expected VPR errors being incorrectly flagged as CI failures?
@shadtorrie @jgoeders

For the CI, the only test that fails in the strong tests is the graphics_commands test from what I've seen.

@sfkhalid
Copy link
Contributor

sfkhalid commented Aug 6, 2020

graphics_commands: k6_N10_mem32K_40nm.xml/stereovision3.v/common Error: Executable vpr failed
full command: /usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png; --max_router_iterations 150
returncode : 23
log file : vpr.out
failed: Executable vpr failed (took 8.78 seconds)

This is the exact message that is coming in the log file when the VtR Strong Sanitized fails in the CI. When running the vtr strong locally, this test passes, but it fails on the CI. Does anyone have an idea on how to get vpr.out from the CI system to check what's happening with the test?

@sfkhalid
Copy link
Contributor

sfkhalid commented Aug 6, 2020

There seem to be two separate issues with regards to the strong test:

  1. graphics_commands fails the CI on sanitized build but not locally
  2. the six tests Bill mentioned fail locally but pass the CI with an OK* message. Vaughn speculates above that this is due to the vtr_flow.py rewrite not passing expected errors the same way as CI.

@shadtorrie
Copy link
Contributor

shadtorrie commented Aug 6, 2020

I cannot reproduce either issue locally, I noticed that the old vtr_flow gave basically vpr.out as the output for the error message. I could submit a pull request to do the same in the python version and that way we could see was is going on in the vpr.out.

@litghost
Copy link
Collaborator

litghost commented Aug 6, 2020

I as well can not reproduce either issue locally, I noticed that the old vtr_flow gave basically vpr.out as the output for the error message. I could submit a pull request to do the same in the python version and that way we could see was is going on in the vpr.out.

There used to be a flag -show_failures that was run during CI, and would result in a failing test output being written:

} elsif ( $token eq "-show_failures" ) {
$show_failures = 1;

Did this feature get dropped during the port?

@shadtorrie
Copy link
Contributor

The show failures flag still shows more than without it but I did miss the functionality to output the whole output file as the failure.

@vaughnbetz
Copy link
Contributor

Discussed in the vtr meeting.
Issue 1: Sarah didn't compile with sanitizers locally. She has the action to compile, run and send out the error log (likely will go to Mashad).
Issue 2: There is a difference in how errors are propagated up depending on the show_failures flag in vtr_flow.py. Shad is fixing that; it is believed to be the reason why this fails locally but not in CI.
Issue 3: Shad is also adding in the ability to get the full vpr.out back from CI easily on a failure.

@litghost
Copy link
Collaborator

litghost commented Aug 6, 2020

Log is located here: https://storage.googleapis.com/vtr-verilog-to-routing/artifacts/prod/foss-fpga-tools/verilog-to-routing/upstream/continuous/strong_sanitized/104/20200806-134600/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_graphics_commands/latest/k6_N10_mem32K_40nm.xml/stereovision3.v/common/vpr.out

Found by following "details" on failed master CI on "Strong Sanitized" (e.g. https://source.cloud.google.com/results/invocations/ae089943-496e-4b9d-9037-7773df8503bb/details), following GCS, and then navigating to vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_graphics_commands/latest/k6_N10_mem32K_40nm.xml/stereovision3.v/common

@shadtorrie
Copy link
Contributor

shadtorrie commented Aug 7, 2020

See the above mentioned pull request for this fix. @Bill-hbrhbr This should fix this issue. @vaughnbetz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants