Skip to content

Replace test.pl by Python-based implementation, possibly within pytest #1924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tautschnig opened this issue Mar 12, 2018 · 15 comments
Closed

Comments

@tautschnig
Copy link
Collaborator

See #1907 for discussion.

@owen-mc-diffblue
Copy link
Contributor

owen-mc-diffblue commented Mar 20, 2018

I've looked into pytest-xdist, which is a plugin for pytest to enable parallel running of tests. It seems to work. We have quite a small set of regression tests so it isn't a great comparison. Also, one of the tests takes much longer than the others (10:00 out of 11:39). Parallelising (with automatically chosen number of cores) reduces the time from 11:39 to 10:27. If I take out that long test then it reduces the time from 1:20 to 0:30. (All times reported are the "real" time obtained from time.)

@tautschnig
Copy link
Collaborator Author

@owen-jones-diffblue Thanks a lot for following up! Which test is it that vastly dominates the run-time? Maybe we've been working hard on solving the wrong problem...

@owen-mc-diffblue
Copy link
Contributor

@tautschnig This was on a private repo, where the tests have already been written using pytest. And yes, we have plans for making that test a lot faster.

@karkhaz
Copy link
Collaborator

karkhaz commented Feb 18, 2019

Soon come ;-)

usage: test.py [-h] -c CMD [-C] [-T] [-F] [-K] [-I TAG] [-X TAG] [-i REGEX]
               [-j N] [-n] [-p] [-s S] [-f]
               [DIRS [DIRS ...]]

Run regression tests according to a test.desc spec

optional arguments:
  -h, --help            show this help message and exit
  -i REGEX, --ignore REGEX
                        options in test.desc matching REGEX are ignored
  -j N, --jobs N        run N parallel jobs. Default: $TESTPL_JOBS if set,
                        else 1
  -n, --dry-run         print the tests that would be run, don't actually run
                        them
  -p, --print-logs      print the logs of each failed test (if any)
  -s S, --suffix S      append S to all output and log files. Allows
                        concurrent testing of the same desc file with
                        different commands or options, as runs with different
                        suffixes will operate independently and keep
                        independent logs
  -f, --forward         forward the test name to CMD

required arguments:
  -c CMD, --command CMD
                        run tests on CMD

test selection:
  DIRS                  run only tests in DIRS
  -C, --core            run essential tests (default if no tests are selected)
  -T, --thorough        run expensive tests
  -F, --future          run checks for future features
  -K, --known-bug       run tests associated with known bugs
  -I TAG, --include TAG
                        run only tests that have the given secondary tag (can
                        be passed multiple times)
  -X TAG, --exclude TAG
                        exclude tests that have the given secondary tag (can
                        be passed multiple times)

test.py expects a test.desc file in each subdirectory. The file test.desc
follows the format specified below. Any line starting with // will be ignored.

<level> [<tag1> [<tag2>...]]
<main source>
<options>
<activate-multi-line-match>
<required patterns>
--
<disallowed patterns>
--
<comment text>

where
  <level>                         is one of CORE, THOROUGH, FUTURE or KNOWNBUG
  <tag1>, <tag2> ... <tagn>       are zero or more space-separated secondary tags, for use with the -I and -X parameters
  <main source>                   is a file with extension .c/.i/.gb/.cpp/.ii/.xml/.class/.jar
  <options>                       additional options to be passed to CMD
  <activate-multi-line-match>     The fourth line can optionally be activate-multi-line-match, if this is the
                                  case then each of the rules will be matched over multiple lines (so can contain)
                                  the new line symbol (\n) inside them.
  <required patterns>             one or more lines of regualar expressions that must occur in the output
  <disallowed patterns>           one or more lines of expressions that must not occur in output
  <comment text>                  free form text

@owen-mc-diffblue
Copy link
Contributor

@karkhaz Have you rewritten test.pl using python? Or are you using a framework, like pytest? (I had to briefly check that it wasn't the 1st of April.)

@karkhaz
Copy link
Collaborator

karkhaz commented Feb 19, 2019

@owen-jones-diffblue It's not finished, but I'm rewriting it from scratch in python, and trying to ensure that it doesn't use any packages from outside the standard library. I actually thought about waiting until April to post the PR, just to be contrary; doing it twice just might make it a tradition...

@chrisr-diffblue
Copy link
Contributor

One thing to consider (if this is possible to fix in the new test runner) is that some of the current regression tests are not thread safe (e.g. if you try to distribute running the tests over multiple cpu's), particularly where one regression directory contains multiple '.desc' files. One such example is regression/goto-cc-cbmc/mixed-c-library-goto-main/*.desc - if the two .desc files in that directory happen to get run in parallel, then it's a race as to which test generates a .gb file that gets analysed. Fixing that would be awesome :)

@owen-mc-diffblue
Copy link
Contributor

@karkhaz On April 1st you can announce that rewriting test.pl was too hard, so you're rewriting cbmc in perl instead 😄.

@karkhaz
Copy link
Collaborator

karkhaz commented Feb 19, 2019

@chrisr-diffblue are multiple .desc files the only situation where this happens? Would it suffice for me to parallelise all directories, but to always serialize .desc files within one directory?

@chrisr-diffblue
Copy link
Contributor

I think thats the main threading issue I'm aware of, so yes, that would probably be one approach - though that does still mean that if you want the test runner to leave behind the intermediate files (e.g. the .gb files for debugging purposes, you have similar confusing behaviour. Probably the most robust approach would be to do something like generate a new temporary working directory for each .desc, and then invoke the appropriate chain,sh or tool from within that working directory (fixing up file paths and include paths as appropriate....). e.g. for a file foo.desc, the runner would create a foo.run directory, which after running the test would contain any intermediate files, plus the foo.out. Cleaning up after a test run would then just be an rm -r of foo.run.

@chrisr-diffblue
Copy link
Contributor

Oh, and "Default: $TESTPL_JOBS if set" - are we keeping the "PL" for sentimental reasons, or should that be "TESTPY_JOBS" now :-)

@karkhaz
Copy link
Collaborator

karkhaz commented Feb 19, 2019

Thank you, a fresh directory for each .desc file sounds sensible---I'll do that. My job is a lot easier due to the plethora of regression tests, which serve to test regressions between test.pl and test.py as well as regressions of cbmc :)

$TESTPL_JOBS: this was intentional. My rationale for this and other features is that I'd prefer not to break people's scripts. So at least for this first implementation, I'm going to strive to keep the command-line interface and other external settings identical. On the other hand, if there are long-standing annoyances with test.pl, this could certainly be an opportunity to finally fix them cleanly, so please feel free to point me to any issues that I might be able to resolve. The parallel .desc file issue is a nice example.

@chrisr-diffblue
Copy link
Contributor

As far as TESTPL_JOBS is concerned, I was mostly just being tongue-in-cheek - keeping as its current name seems entirely sensible. Its great to see someone finally tackling this and helping out everyone in the processes!

@hannes-steffenhagen-diffblue
Copy link
Contributor

Just a stray comment: One mild annoyance with the way test.pl right now is the colouring, which is unconditionally applied, which is fine if you're viewing the test log on travis or on your terminal, but a nuisance if you're writing to a file or viewing the log on AWS CodeBuild. I'd like to change the behaviour to

  1. By default, only colour on terminal output (and even then only terminals that support colour codes if we have a sensible way of detecting that without additional dependencies)
  2. Have an environment variable (and possibly a flag) to override that. TESTPL_COLOUR=on|off|auto, something along those lines, when you actually want to write the colour codes despite not writing to terminal?

@TGWDB
Copy link
Contributor

TGWDB commented Jun 14, 2021

Closing since I guess this is no longer happening. Please reopen if you believe this is erroneous, or have the mythical python implementation working.

@TGWDB TGWDB closed this as completed Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants