vtr_reg_strong test failures #1459

Bill-hbrhbr · 2020-07-31T21:55:24Z

Expected Behaviour

vtr_reg_strong should pass on the master branch

Current Behaviour

There are six failed tests.
target_pin_util: EArch.xml/styr.blif/common_--target_ext_pin_util_-0.1
target_pin_util: EArch.xml/styr.blif/common_--target_ext_pin_util_1.1
target_pin_util: EArch.xml/styr.blif/common_--target_ext_pin_util_io:0.1,0.1_clb:0.7_0.8,1.0_1.0
target_pin_util: EArch.xml/styr.blif/common_--target_ext_pin_util_io:0.1,0.1_clb:0.7_0.8,1.0_clb:1.0
clock_pll: k6_frac_N10_mem32K_40nm_clk_pll_invalid.xml/multiclock_buf.blif/common
pack_disable: k6_frac_N10_40nm_disable_packing.xml/mult_5x6.blif/common

Possible Solution

Maybe one of the recent commits broke the reg tests.

Steps to Reproduce

Checkout the master branch
./run_reg_test.pl vtr_reg_strong

logfile:
vtr_reg_strong.txt

litghost · 2020-08-04T18:22:10Z

Looks like there is a strong sanitized failure:

graphics_commands:        k6_N10_mem32K_40nm.xml/stereovision3.v/common         		Error: Executable vpr failed
	full command:  /usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png; --max_router_iterations 150
	returncode  :  23
	log file    :  vpr.out
failed: Executable vpr failed (took 8.07 seconds)

dpbaines · 2020-08-05T03:09:30Z

Someone just needs to enclose the graphics command portion:

set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png;

in quotation marks it looks like. Here should work, I don't have permissions for kokoro though.

/usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands "set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png;" --max_router_iterations 150

litghost · 2020-08-05T17:03:45Z

Someone just needs to enclose the graphics command portion:

set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png;

in quotation marks it looks like. Here should work, I don't have permissions for kokoro though.

/usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands "set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png;" --max_router_iterations 150

The test is fully controlled by "run_reg_test.pl", etc. Given that the non-sanitized version doesn't fail, I don't believe your analysis of the failure is not correct. I assume that the output upon failure is not right, but the invocation has quotes in the right places.

Bottom line is that I don't believe that kokoro is related to the failure itself.

sfkhalid · 2020-08-05T17:25:48Z

After running the tests on the master branch, it now seems like vtr_reg_basic and vtr_reg_strong are failing all tests. Has anyone else had this happen?

litghost · 2020-08-05T17:28:34Z

After running the tests on the master branch, it now seems like vtr_reg_basic and vtr_reg_strong are failing all tests. Has anyone else had this happen?

I don't see that on the CI side of things:
https://travis-ci.com/github/verilog-to-routing/vtr-verilog-to-routing/jobs/368413853
https://travis-ci.com/github/verilog-to-routing/vtr-verilog-to-routing/jobs/368413851

Those are both green? Do you mean with or without sanitization?

sfkhalid · 2020-08-05T17:39:24Z

After running the tests on the master branch, it now seems like vtr_reg_basic and vtr_reg_strong are failing all tests. Has anyone else had this happen?

I don't see that on the CI side of things:
https://travis-ci.com/github/verilog-to-routing/vtr-verilog-to-routing/jobs/368413853
https://travis-ci.com/github/verilog-to-routing/vtr-verilog-to-routing/jobs/368413851

Those are both green? Do you mean with or without sanitization?

Not on the CI side, I checkout out master on my local machine and ran the tests and every single test failed for vtr_reg_basic and vtr_reg_strong. That didn't happen yesterday when I ran those tests on master - just the six tests that Bill mentioned above failed.

litghost · 2020-08-05T17:45:11Z

I just believe the problem in that case is on your machine. I just re-built at master and re-ran vtr_reg_basic and it passed. Given that it matches CI, that provides reasonable evidence that master is not that broken.

At this point, I would recommend you clean your build directory and rebuild. Also what git hash is your local repository at? The default repo branch is currently at d4ea405 and this is where I quickly ran the vtr_reg_basic test that did pass.

sfkhalid · 2020-08-05T17:51:00Z

Ok, I'll try that. Actually, on the CI side the tests are all failing as well, on a branch I just pushed new commits to. I'll try what you said and try to investigate what's happening.

dpbaines · 2020-08-05T19:56:36Z

As for this issue, I agree, but it appears the config for graphics_commands on Kokoro isn't inline with what's in the config file here: https://github.com/verilog-to-routing/vtr-verilog-to-routing/blob/master/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_graphics_commands/config/config.txt

So it appears to be configured correctly on the Github side, but the CI side appears to be running a slightly different command:
/usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png; --max_router_iterations 150

Notice the addition of max_router_iterations argument. I fully could be wrong though!

litghost · 2020-08-05T20:06:41Z

The max_router_iterations comes from run_vtr_flow:

vtr-verilog-to-routing/vtr_flow/scripts/run_vtr_flow.py

Lines 333 to 338 in 3807432

    
           vpr.add_argument( 
        
               "-crit_path_router_iterations", 
        
               type=int, 
        
               default=150, 
        
               help="Tells VPR the amount of iterations allowed to obtain the critical path.", 
        
           )

vaughnbetz · 2020-08-06T04:15:59Z

I believe all 6 failures flagged by Bill are cases where we expect VPR to fail with an error (they test error checking). So the error may be due to the vtr_flow rewrite — are expected VPR errors being incorrectly flagged as CI failures?
@shadtorrie @jgoeders

sfkhalid · 2020-08-06T14:17:27Z

I believe all 6 failures flagged by Bill are cases where we expect VPR to fail with an error (they test error checking). So the error may be due to the vtr_flow rewrite — are expected VPR errors being incorrectly flagged as CI failures?
@shadtorrie @jgoeders

For the CI, the only test that fails in the strong tests is the graphics_commands test from what I've seen.

sfkhalid · 2020-08-06T15:46:17Z

graphics_commands: k6_N10_mem32K_40nm.xml/stereovision3.v/common Error: Executable vpr failed
full command: /usr/bin/env time -v /tmpfs/src/github/vtr-verilog-to-routing/vpr/vpr k6_N10_mem32K_40nm.xml stereovision3 --circuit_file stereovision3.pre-vpr.blif --route_chan_width 100 --graphics_commands set_draw_block_outlines 0; set_draw_block_text 0; set_draw_block_internals 2; set_draw_net_max_fanout 128; save_graphics place.png; set_nets 1; save_graphics nets1.png; set_nets 2; save_graphics nets2.png; set_nets 0; set_cpd 1; save_graphics cpd1.png; --max_router_iterations 150
returncode : 23
log file : vpr.out
failed: Executable vpr failed (took 8.78 seconds)

This is the exact message that is coming in the log file when the VtR Strong Sanitized fails in the CI. When running the vtr strong locally, this test passes, but it fails on the CI. Does anyone have an idea on how to get vpr.out from the CI system to check what's happening with the test?

sfkhalid · 2020-08-06T15:49:11Z

There seem to be two separate issues with regards to the strong test:

graphics_commands fails the CI on sanitized build but not locally
the six tests Bill mentioned fail locally but pass the CI with an OK* message. Vaughn speculates above that this is due to the vtr_flow.py rewrite not passing expected errors the same way as CI.

shadtorrie · 2020-08-06T16:44:58Z

I cannot reproduce either issue locally, I noticed that the old vtr_flow gave basically vpr.out as the output for the error message. I could submit a pull request to do the same in the python version and that way we could see was is going on in the vpr.out.

litghost · 2020-08-06T16:55:09Z

I as well can not reproduce either issue locally, I noticed that the old vtr_flow gave basically vpr.out as the output for the error message. I could submit a pull request to do the same in the python version and that way we could see was is going on in the vpr.out.

There used to be a flag -show_failures that was run during CI, and would result in a failing test output being written:

vtr-verilog-to-routing/run_reg_test.pl

Lines 85 to 86 in d4ea405

    
           } elsif ( $token eq "-show_failures" ) { 
        
           	$show_failures = 1;

Did this feature get dropped during the port?

shadtorrie · 2020-08-06T16:59:05Z

The show failures flag still shows more than without it but I did miss the functionality to output the whole output file as the failure.

vaughnbetz · 2020-08-06T22:11:29Z

Discussed in the vtr meeting.
Issue 1: Sarah didn't compile with sanitizers locally. She has the action to compile, run and send out the error log (likely will go to Mashad).
Issue 2: There is a difference in how errors are propagated up depending on the show_failures flag in vtr_flow.py. Shad is fixing that; it is believed to be the reason why this fails locally but not in CI.
Issue 3: Shad is also adding in the ability to get the full vpr.out back from CI easily on a failure.

litghost · 2020-08-06T22:22:07Z

Log is located here: https://storage.googleapis.com/vtr-verilog-to-routing/artifacts/prod/foss-fpga-tools/verilog-to-routing/upstream/continuous/strong_sanitized/104/20200806-134600/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_graphics_commands/latest/k6_N10_mem32K_40nm.xml/stereovision3.v/common/vpr.out

Found by following "details" on failed master CI on "Strong Sanitized" (e.g. https://source.cloud.google.com/results/invocations/ae089943-496e-4b9d-9037-7773df8503bb/details), following GCS, and then navigating to vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_graphics_commands/latest/k6_N10_mem32K_40nm.xml/stereovision3.v/common

shadtorrie · 2020-08-07T14:40:41Z

See the above mentioned pull request for this fix. @Bill-hbrhbr This should fix this issue. @vaughnbetz

dpbaines mentioned this issue Aug 6, 2020

Adding contrast to flyline delay text so it's readable now #1451

Merged

7 tasks

vaughnbetz mentioned this issue Aug 6, 2020

Memory leaks in vtr_reg_strong graphics test #1470

Closed

shadtorrie mentioned this issue Aug 7, 2020

Vtr flow failure fix #1472

Merged

7 tasks

Bill-hbrhbr closed this as completed Aug 29, 2020

vtr_reg_strong test failures #1459

vtr_reg_strong test failures #1459

Comments

Bill-hbrhbr commented Jul 31, 2020

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce

litghost commented Aug 4, 2020

Uh oh!

dpbaines commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Aug 5, 2020

Uh oh!

sfkhalid commented Aug 5, 2020

Uh oh!

litghost commented Aug 5, 2020

Uh oh!

sfkhalid commented Aug 5, 2020

Uh oh!

litghost commented Aug 5, 2020

Uh oh!

sfkhalid commented Aug 5, 2020

Uh oh!

dpbaines commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Aug 5, 2020

Uh oh!

vaughnbetz commented Aug 6, 2020

Uh oh!

sfkhalid commented Aug 6, 2020

Uh oh!

sfkhalid commented Aug 6, 2020

Uh oh!

sfkhalid commented Aug 6, 2020

Uh oh!

shadtorrie commented Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shadtorrie commented Aug 6, 2020

Uh oh!

vaughnbetz commented Aug 6, 2020

Uh oh!

litghost commented Aug 6, 2020

Uh oh!

shadtorrie commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpbaines commented Aug 5, 2020 •

edited

Loading

dpbaines commented Aug 5, 2020 •

edited

Loading

shadtorrie commented Aug 6, 2020 •

edited

Loading

litghost commented Aug 6, 2020 •

edited

Loading

shadtorrie commented Aug 7, 2020 •

edited

Loading