Skip to content

Updating docs to mention vtr_reg_nightly parallelism strategy #1776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
185 changes: 185 additions & 0 deletions README.developers.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,191 @@ Another reason jobs may not start is if there is a large backlog of jobs
running, there may be no runners left to start. In this case, someone with
Kokoro management rights may need to terminate stale jobs, or wait for job
timeouts.
### Parallesim Startegy for vtr_reg_nightly:
#### Current Sub-suites:

* The nightly regression suite is broken up into multiple sub-suites to minimize the wall-clock when ran by CI using Kokoro machines.
* The lower bound for the run-time of the nightly regression tests is the longest vtr_flow run in all suites (currently this flow is in vtr_reg_nightly_test2/vtr_reg_qor)
* To minimize wall-clock time, tasks which have the three longest flow runs are put in seperate directories and other tasks are added to keep the
run-time for the sub-suite under ~5 hours using -j8 option on the Kokoro machines.
* The longest tasks are put at the bottom of task_list.txt to get started first (the files are read in backwards in `run_reg_test.py`
* If tasks that do not have long flow runs are to be added, it is best that they are added under vtr_reg_nightly_test1 as this suite has the smallest run-time
of all suites (~2 hours using -j8).

#### Adding Sub-suites:

* If tasks with long flows that exceed ~3 hours are to be added, it is best to seperate them from the other suites and put it in a seperate test
at the bottom of the task list.
* Adding additional suites to vtr_reg_nightly comprises of three steps:
- a config file (.cfg) has to be added to the config list for Kokoro machines located at `$VTR_ROOT/.github/kokoro/presubmit`. The new config should be indentical to the other config file for nightly tests, with the only difference being the value for VTR_TEST (i.e. the value should be changed to the directory name for the new suite say vtr_reg_nightly_testX).
- `$VTR_ROOT/.github/kokoro/steps/vtr-test.sh` needs to get updated to recongize the new suite and zip up the output files (we don't want the machine to run of disk space ...). e.g. if the suite to be added is `vtr_reg_nightly_testX`, the following line should be added to the script in its appropriate place:
```
find vtr_flow/tasks/regression_tests/vtr_reg_nightly_testX/ -type f -print0 | xargs -0 -P $(nproc) gzip
```

- The previous addition of .cfg file sets up the configs from our side of the repo. The new configs need to be submitted on Google's side aswell for the Kokoro machines to run the new CI tests. Best person to contact to do this setup is Tim Ansell (@mithro on Github).
* Below are generalized examples of the `vtr_reg_nightly_testX.cfg` and new `vtr_test.sh`:

*vtr_reg_nightly_testX.cfg:*
```
# Format: //devtools/kokoro/config/proto/build.proto

build_file: "vtr-verilog-to-routing/.github/kokoro/run-vtr.sh"

# 72 hours
timeout_mins: 4320

action {
define_artifacts {
# File types
regex: "**/*.out"
regex: "**/vpr_stdout.log"
regex: "**/parse_results.txt"
regex: "**/qor_results.txt"
regex: "**/pack.log"
regex: "**/place.log"
regex: "**/route.log"
regex: "**/*_qor.csv"
regex: "**/*.out.gz"
regex: "**/vpr_stdout.log.gz"
regex: "**/parse_results.txt.gz"
regex: "**/qor_results.txt.gz"
regex: "**/pack.log.gz"
regex: "**/place.log.gz"
regex: "**/route.log.gz"
regex: "**/*_qor.csv.gz"
strip_prefix: "github/vtr-verilog-to-routing/"
}
}

env_vars {
key: "KOKORO_TYPE"
value: "presubmit"
}

env_vars {
key: "KOKORO_DIR"
value: "vtr-verilog-to-routing"
}

env_vars {
key: "VTR_DIR"
value: "vtr-verilog-to-routing"
}

#Use default build configuration
env_vars {
key: "VTR_CMAKE_PARAMS"
value: ""
}
#THIS VARIABLE WILL BE CHANGED FOR NEW SUITES.
env_vars {
key: "VTR_TEST"
value: "vtr_reg_nightly_testX"
}

#Options for run_reg_test.py
# -show_failures: show tool failures in main log output
env_vars {
key: "VTR_TEST_OPTIONS"
value: "-show_failures"
}

env_vars {
key: "NUM_CORES"
value: "8"
}
```
*vtr_test.sh:*
```
#!/bin/bash

if [ -z ${VTR_TEST+x} ]; then
echo "Missing $$VTR_TEST value"
exit 1
fi

if [ -z ${VTR_TEST_OPTIONS+x} ]; then
echo "Missing $$VTR_TEST_OPTIONS value"
exit 1
fi

if [ -z $NUM_CORES ]; then
echo "Missing $$NUM_CORES value"
exit 1
fi

echo $PWD
pwd
pwd -L
pwd -P
cd $(realpath $(pwd))
echo $PWD
pwd
pwd -L
pwd -P

(
while :
do
date
uptime
free -h
sleep 300
done
) &
MONITOR=$!

echo "========================================"
echo "VPR Build Info"
echo "========================================"
./vpr/vpr --version

echo "========================================"
echo "Running Tests"
echo "========================================"
export VPR_NUM_WORKERS=1

set +e
./run_reg_test.py $VTR_TEST $VTR_TEST_OPTIONS -j$NUM_CORES
TEST_RESULT=$?
set -e
kill $MONITOR

echo "========================================"
echo "Cleaning benchmarks files"
echo "========================================"
# Removing Symbiflow archs and benchmarks
find vtr_flow/arch/symbiflow/ -type f -not -name 'README.*' -delete
find vtr_flow/benchmarks/symbiflow/ -type f -not -name 'README.*' -delete

# Removing ISPD benchmarks
find vtr_flow/benchmarks/ispd_blif/ -type f -not -name 'README.*' -delete

# Removing Titan benchmarks
find vtr_flow/benchmarks/titan_blif/ -type f -not -name 'README.*' -delete

# Removing ISPD, Titan and Symbiflow tarballs
find . -type f -regex ".*\.tar\.\(gz\|xz\)" -delete

#ADD THE NEW COMMAND FOR ZIPPING OUTPUTS HERE
#Gzip output files from vtr_reg_nightly tests to lower working directory disk space
find vtr_flow/tasks/regression_tests/vtr_reg_nightly_test1/ -type f -print0 | xargs -0 -P $(nproc) gzip
find vtr_flow/tasks/regression_tests/vtr_reg_nightly_test2/ -type f -print0 | xargs -0 -P $(nproc) gzip
find vtr_flow/tasks/regression_tests/vtr_reg_nightly_test3/ -type f -print0 | xargs -0 -P $(nproc) gzip
find vtr_flow/tasks/regression_tests/vtr_reg_nightly_testX/ -type f -print0 | xargs -0 -P $(nproc) gzip



# Make sure working directory doesn't exceed disk space limit!
echo "Working directory size: $(du -sh)"
if [[ $(du -s | cut -d $'\t' -f 1) -gt $(expr 1024 \* 1024 \* 90) ]]; then
echo "Working directory too large!"
exit 1
fi

exit $TEST_RESULT
```

# Debugging Failed Tests

Expand Down