Script to run kani on top 100 crates (rust-lang#1327)

Yoshiki Takashima · web-flow · commit cb1a6068a152 · 2022-07-11T18:08:59.000-04:00
* Skeleton of top 100 tests done. No specific checks yet.

Still need to implement core calls.

* Added todo comments.

* Implemented clone and kani call. Todo error printing.

* Ubuntu issue also fixed. Implemented concurrent kani exec.

Todo: finish analysis.

* Fixed printing that was broken.

* Made option to not print stdout,  often repetition of stderr.

* Added documentation.

* Removed the manual indexes. Use awk to generate uuid from counter

* Added license.

* Fixed bad string comparison.

* updated mode.

* String comparison bug again.

* Mitigated xargs -d not being available on OSX.

* Changed to all spaces.

* Moved list of target repos to a separate file.

* Fixed error code counting

* Implemented printing for error counts.

* Changed script to use STDIN.

* Renamed script

* Updated documentation on many run script.

* Moved to scripts to avoid formatting.

* Moved list back to tests, excude with other method.

* Exclude remote target list.

* Renamed top 100 list.

* Added list for top 1k.

* Changed name of download directory to something more descriptive.

* Implemented path arg. Moved recursion to exported function.

* Fixed docs and messages.

* Fixed zero target check.

* Moved script to subdir.

* Added warning.

* Added filename logging to error message.

* Added directory logging for error.

* Updated docs on how to pick list.

* Deleted file with 1k targets.

* Linked with the Table of Contents.

* Slight word change, force CI to run.
diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
@@ -25,6 +25,7 @@
   - [Testing](./testing.md)
     - [Regression testing](./regression-testing.md)
     - [Book runner](./bookrunner.md)
+    - [(Experimental) Testing with a Large Number of Repositories](./repo-crawl.md)
 
 - [Limitations](./limitations.md)
   - [Undefined behaviour](./undefined-behaviour.md)
diff --git a/docs/src/repo-crawl.md b/docs/src/repo-crawl.md
@@ -0,0 +1,94 @@
+# (Experimental) Testing with a Large Number of Repositories
+
+This section explains how to run Kani on a large number of crates
+downloaded from git forges. You may want to do this if you are going
+to test Kani's ability to handle Rust features found in projects out
+in the wild.
+
+For the first half, we will explain how to use data from crates.io to
+pick targets. Second half will explain how to use a script to run on a
+list of selected repositories.
+
+## Picking Repositories
+
+In picking repositories, you may want to select by metrics like
+popularity or by the presence of certain features. In this section, we
+will explain how to select top ripostes by download count.
+
+We will use the `db-dump` method of getting data from crates.io as it
+is zero cost to their website and gives us SQL access. To start, have
+the following programs set up on your computer.
+- docker
+- docker-compose.
+
+1. Start PostgreSQL. Paste in the following yaml file as
+`docker-compose.yaml`. `version: '3.3'` may need to change.
+```yaml
+version: '3.3'
+services:
+  db:
+    image: postgres:latest
+    restart: always
+    environment:
+      - POSTGRES_USER=postgres
+      - POSTGRES_PASSWORD=postgres
+    volumes:
+      - crates-data:/var/lib/postgresql/data
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "50m"
+volumes:
+  crates-data:
+    driver: local
+```
+Then, run the following to start the setup.
+```bash
+docker-compose up -d
+```
+
+Once set up, run `docker ls` to figure out the container's name. We
+will refer to the name as `$CONTAINER_NAME` from now on.
+
+2. Download actual data from crates.io. First, run the following
+   command to get a shell in the container: `docker exec -it --user
+   postgres $CONTAINER_NAME bash`. Now, run the following to grab and
+   install the data into the repository. Please note that this may
+   take a while.
+
+   ```bash
+   wget https://static.crates.io/db-dump.tar.gz
+   tar -xf db-dump.tar.gz
+   psql postgres -f */schema.sql
+   psql postgres -f */import.sql
+   ```
+
+3. Extract the data. In the same docker shell, run the following to
+   extract the top 1k repositories. Other SQL queries may be used if
+   you want another criteria
+
+   ```sql
+   \copy
+   (SELECT name, repository, downloads  FROM crates
+   WHERE repository LIKE 'http%' ORDER BY DOWNLOADS DESC LIMIT 1000)
+   to 'top-1k.csv' csv header;
+   ```
+
+4. Clean the data. The above query will capture duplicates paths that
+   are deeper than the repository. You can clean these out.
+   - URL from CSV: `cat top-1k.csv | awk -F ',' '{ print $2 }' | grep -v 'http.*'`
+   - Remove long paths: `sed 's/tree\/master.*$//g'`
+   - Once processed, you can dedup with `sort | uniq --unique`
+
+## Running the List of Repositories
+In this step we will download the list of repositories using a script
+[kani-run-on-repos.sh](../../scripts/exps/kani-run-on-repos.sh)
+
+Make sure to have Kani ready to run. If not, compile with `cargo build
+--workspace`.
+
+From the repository root, you can run the script with
+`./scripts/exps/kani-run-on-repos.sh $URL_LIST_FILE` where
+`$URL_LIST_FILE` points to a line-delimited list of URLs you want to
+run Kani on. Repositories that give warnings or errors can be grepping
+for with "STDERR Warnings" and "Error exit in" respectively.
diff --git a/docs/src/testing.md b/docs/src/testing.md
@@ -13,6 +13,8 @@ two very good reasons to do it:
     characteristics which are quantitative and countable. Metrics are
     particularly valuable for project management purposes.
 
-We recommend reading our section on [Regression Testing](./regression-testing.md)
-if you're interested in Kani development. At present, we obtain metrics based
-on the [book runner](./bookrunner.md).
+We recommend reading our section on [Regression
+Testing](./regression-testing.md) if you're interested in Kani
+development. At present, we obtain metrics based on the [book
+runner](./bookrunner.md). To run kani on a large number of remotely
+hosted crates, please see [Repository Crawl](./repo-crawl.md).
diff --git a/scripts/ci/copyright-exclude b/scripts/ci/copyright-exclude
@@ -15,4 +15,5 @@ gitignore
 gitmodules
 ignore
 scripts/ci/copyright-exclude
+tests/remote-target-lists/.*
 tools/make-kani-release/license-notes.txt
diff --git a/scripts/exps/kani-run-on-repos.sh b/scripts/exps/kani-run-on-repos.sh
@@ -0,0 +1,140 @@
+#!/bin/bash
+# Copyright Kani Contributors
+# SPDX-License-Identifier: Apache-2.0 OR MIT
+
+
+DOCUMENTATION=\
+'kani-run-on-repos.sh -- script to clone and compile many remote git repositories with Kani.
+
+WARNING: Because this script clones repositories at the HEAD, the
+results may not be stable when the target code changes.
+
+USAGE:
+./scripts/kani-run-on-repos.sh path/to/url-list
+
+Download the top 100 crates and runs kani on them. Prints out the
+errors and warning when done. Xargs is required for this script to
+work.
+
+url-list: A list of URLs to run Kani on. One per line.
+
+ENV:
+- PRINT_STDOUT=1 forces this script to search for warning in
+  STDOUT in addition to STDERR
+
+EDITING:
+- To adjust the git clone or kani args, modify the function
+  `clone_and_run_kani`.
+- To adjust the errors this script searches for, edit the function
+  `print_errors_for_each_repo_result`
+
+Copyright Kani Contributors
+SPDX-License-Identifier: Apache-2.0 OR MIT'
+
+export SELF_SCRIPT=$0
+export SELF_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
+NPROC=$(nproc 2> /dev/null || sysctl -n hw.ncpu 2> /dev/null || echo 4)  # Linux or Mac or hard-coded default of 4
+export WORK_DIRECTORY_PREFIX="$SELF_DIR/../../target/remote-repos"
+
+
+export STDOUT_SUFFIX='stdout.cargo-kani'
+export STDERR_SUFFIX='stderr.cargo-kani'
+export EXIT_CODE_SUFFIX='exit-code.cargo-kani'
+# worker function that clones target repos and runs kani over
+# them. This functions is called in parallel by
+# parallel_clone_and_run, and should not be run explicitly
+function clone_and_run_kani {
+    WORK_NUMBER_ID=$(echo $1 | awk -F ',' '{ print $1}')
+    REPOSITORY_URL=$(echo $1 | awk -F ',' '{ print $2}')
+    REPO_DIRECTORY="$WORK_DIRECTORY_PREFIX/$WORK_NUMBER_ID"
+    echo "work# $WORK_NUMBER_ID -- $REPOSITORY_URL"
+
+    # clone or update repository
+    (git clone $REPOSITORY_URL $REPO_DIRECTORY 2> /dev/null || git -C $REPO_DIRECTORY pull)
+
+    # run cargo kani compile on repo. save results to file.
+    PATH=$PATH:$(readlink -f $SELF_DIR/..)
+    (cd $REPO_DIRECTORY; nice -n15 cargo kani --only-codegen) \
+         1> $REPO_DIRECTORY/$STDOUT_SUFFIX \
+         2> $REPO_DIRECTORY/$STDERR_SUFFIX
+    echo $? > $REPO_DIRECTORY/$EXIT_CODE_SUFFIX
+}
+export -f clone_and_run_kani
+
+OVERALL_EXIT_CODE='0'
+TARGET_ERROR_REGEX='warning:\sFound\sthe\sfollowing\sunsupported\sconstructs:\|WARN'
+# printing function that greps the error logs and exit code.
+function print_errors_for_each_repo_result {
+    DIRECTORY=$1
+    IS_FAIL='0'
+
+    error_code="$(cat $DIRECTORY/$EXIT_CODE_SUFFIX)"
+    if [ "$error_code" != "0" ]; then
+        echo -e "Error exit in $DIRECTORY: code $error_code\n"
+        IS_FAIL='1'
+    fi
+
+    STDERR_GREP=$(grep -A3 -n $TARGET_ERROR_REGEX $DIRECTORY/$STDERR_SUFFIX 2> /dev/null && echo 'STDERR has warnings')
+    if [[ "$STDERR_GREP" =~ [a-zA-Z0-9] ]]; then
+        echo -e "STDERR Warnings (Plus 3 lines after) $DIRECTORY/$STDERR_SUFFIX -----\n$STDERR_GREP"
+        IS_FAIL='1'
+    fi
+
+    STDOUT_GREP=$(grep -A3 -n $TARGET_ERROR_REGEX $DIRECTORY/$STDOUT_SUFFIX 2> /dev/null && echo 'STDOUT has warnings')
+    if [[ "$STDOUT_GREP" =~ [a-zA-Z0-9] ]] && [ "$PRINT_STDOUT" = '1' ]; then
+        echo -e "STDOUT Warnings (Plus 3 lines after) $DIRECTORY/$STDOUT_SUFFIX -----\n$STDOUT_GREP"
+        IS_FAIL='1'
+    fi
+
+    if [ "$IS_FAIL" -eq "0" ]; then
+        echo 'Ok'
+    fi
+}
+
+if ! which xargs 1>&2 1> /dev/null; then
+    echo "Need to have xargs installed. Please install with `apt-get install -y xargs`"
+    exit -1
+elif [[ "$*" == *"--help"* ]]; then
+    echo -e "$DOCUMENTATION"
+elif [ "$#" -eq "1" ]; then
+    # top level logic that runs clone_and_run_kani in parallel with xargs.
+    echo "Reading URLs from $1...";
+    LIST_OF_CRATE_GIT_URLS=$(cat $1)
+    if [[ -z "$(echo $LIST_OF_CRATE_GIT_URLS | sed 's/\s//g')"  ]]; then
+        echo 'No targets found.'
+        exit -1
+    fi
+
+    mkdir -p $WORK_DIRECTORY_PREFIX
+    echo -e "$LIST_OF_CRATE_GIT_URLS" | \
+        awk -F '\n' 'BEGIN{ a=0 }{ print a++ "," $1  }' | \
+        xargs -n1 -I {} -P $NPROC bash -c "clone_and_run_kani {}"
+
+    # serially print out the ones that failed.
+    num_failed="0"
+    num_with_warning='0'
+    for directory in $(ls $WORK_DIRECTORY_PREFIX); do
+        REPOSITORY=$(git -C $WORK_DIRECTORY_PREFIX/$directory remote -v | awk '{ print $2 }' | head -1)
+        echo "repository: $REPOSITORY"
+
+        ERROR_OUTPUTS=$(print_errors_for_each_repo_result $WORK_DIRECTORY_PREFIX/$directory)
+        if [[ "$ERROR_OUTPUTS" =~ 'STDERR Warnings' ]]; then
+            OVERALL_EXIT_CODE='1'
+            num_with_warning=$(($num_with_warning + 1))
+        fi
+        if [[ "$ERROR_OUTPUTS" =~ 'Error exit in' ]]; then
+            num_failed=$(($num_failed + 1))
+        fi
+
+        echo -e "$ERROR_OUTPUTS" | sed 's/^/    /'
+    done
+
+    echo -e '\n--- OVERALL STATS ---'
+    echo "$num_failed crates failed to compile"
+    echo "$num_with_warning crates had warning(s)"
+else
+    echo -e 'Needs exactly 1 argument path/to/url-list.\n'
+    echo -e "$DOCUMENTATION"
+fi
+
+exit $OVERALL_EXIT_CODE
diff --git a/tests/remote-target-lists/top-100-crates-2022-6-27.txt b/tests/remote-target-lists/top-100-crates-2022-6-27.txt
@@ -0,0 +1,83 @@
+https://github.com/Amanieu/parking_lot
+https://github.com/Amanieu/thread_local-rs
+https://github.com/BurntSushi/aho-corasick
+https://github.com/BurntSushi/byteorder
+https://github.com/BurntSushi/memchr
+https://github.com/BurntSushi/termcolor
+https://github.com/Frommi/miniz_oxide
+https://github.com/Gilnaa/memoffset
+https://github.com/Kimundi/rustc-version-rs
+https://github.com/RustCrypto/traits
+https://github.com/RustCrypto/utils
+https://github.com/SergioBenitez/version_check
+https://github.com/SimonSapin/rust-std-candidates
+https://github.com/alexcrichton/cc-rs
+https://github.com/alexcrichton/cfg-if
+https://github.com/alexcrichton/toml-rs
+https://github.com/bitflags/bitflags
+https://github.com/bluss/arrayvec
+https://github.com/bluss/either
+https://github.com/bluss/indexmap
+https://github.com/bluss/scopeguard
+https://github.com/chronotope/chrono
+https://github.com/clap-rs/clap
+https://github.com/contain-rs/vec-map
+https://github.com/crossbeam-rs/crossbeam
+https://github.com/cryptocorrosion/cryptocorrosion
+https://github.com/cuviper/autocfg
+https://github.com/dguo/strsim-rs
+https://github.com/dtolnay/anyhow
+https://github.com/dtolnay/itoa
+https://github.com/dtolnay/proc-macro-hack
+https://github.com/dtolnay/proc-macro2
+https://github.com/dtolnay/quote
+https://github.com/dtolnay/ryu
+https://github.com/dtolnay/semver
+https://github.com/dtolnay/syn
+https://github.com/dtolnay/thiserror
+https://github.com/env-logger-rs/env_logger
+https://github.com/fizyk20/generic-array.git
+https://github.com/hyperium/h2
+https://github.com/hyperium/http
+https://github.com/hyperium/hyper
+https://github.com/marshallpierce/rust-base64
+https://github.com/matklad/once_cell
+https://github.com/mgeisler/textwrap
+https://github.com/ogham/rust-ansi-term
+https://github.com/paholg/typenum
+https://github.com/retep998/winapi-rs
+https://github.com/rust-itertools/itertools
+https://github.com/rust-lang-nursery/lazy-static.rs
+https://github.com/rust-lang/backtrace-rs
+https://github.com/rust-lang/futures-rs
+https://github.com/rust-lang/hashbrown
+https://github.com/rust-lang/libc
+https://github.com/rust-lang/log
+https://github.com/rust-lang/pkg-config-rs
+https://github.com/rust-lang/regex
+https://github.com/rust-lang/socket2
+https://github.com/rust-num/num-integer
+https://github.com/rust-num/num-traits
+https://github.com/rust-random/getrandom
+https://github.com/rust-random/rand
+https://github.com/seanmonstar/httparse
+https://github.com/seanmonstar/num_cpus
+https://github.com/serde-rs/json
+https://github.com/serde-rs/serde
+https://github.com/servo/rust-fnv
+https://github.com/servo/rust-smallvec
+https://github.com/servo/rust-url
+https://github.com/servo/unicode-bidi
+https://github.com/softprops/atty
+https://github.com/steveklabnik/semver-parser
+https://github.com/taiki-e/pin-project-lite
+https://github.com/time-rs/time
+https://github.com/tokio-rs/bytes
+https://github.com/tokio-rs/mio
+https://github.com/tokio-rs/slab
+https://github.com/tokio-rs/tokio
+https://github.com/unicode-rs/unicode-normalization
+https://github.com/unicode-rs/unicode-segmentation
+https://github.com/unicode-rs/unicode-width
+https://github.com/unicode-rs/unicode-xid
+https://github.com/withoutboats/heck