Add support for an xfails file, and allow the files to be specified with a flag #157

asmeurer · 2023-01-03T01:56:38Z

I think the existing file for NumPy on CI needs to be updated, but I haven't looked into this. We really ought to prefer XFAIL to skips everywhere so we know when a test starts to pass. Note that XPASS doesn't actually cause a failure by default unless -o xfail_strict=True is used.

…ith a flag

Also include the file name in the reason.

This requires data-apis/array-api-tests#157 to work.

array_api_tests/test_has_names.py

honno · 2023-01-03T11:00:40Z

I still disagree we should support xfails. Please address all the points made in #90 (comment)

asmeurer · 2023-01-05T01:25:29Z

I need this for the array api compat library. I want to be alerted on CI when an xfailing test starts passing, so that I know that some behavior is working again. I'd also ideally like to do the same for the base libraries so that I know when some piece of compat code can be removed.

I'm really not worried about flakyness. If it becomes a problem, we can figure out how to fix it when we get to it. Flakyness is an inherent problem when using hypothesis, but I don't think it's a good excuse to throw good testing practices out the window. There are several ways it can be addressed, e.g., increasing max examples, storing the examples database persistently on CI, modifying the test itself in some way, or even just making that single test a skip (as opposed to only using skips for all tests).

I'm still concerned about our use of skips in this repo. Right now the NumPy CI on this repo is failing anyway, but ideally, I'd like to run both numpy and numpy.array_api, either on this repo or elsewhere, with xfails. Firstly, that will alert us to changes in the library support, which is important for numpy since we are effectively in charge of that support. And secondly, the CI on this repo currently only runs against numpy.array_api with skips. That means that those skipped tests are never run on CI. If they were XFAIL they would at least run at all (XFAIL with expected failure message might be more robust here, but OTOH that might actually be too flaky to deal with).

asmeurer · 2023-01-05T01:30:46Z

Also I should note that I want to do the same for cupy, which I can only run locally occasionally on my PC (unless we can figure out a CI solution there, but I really doubt that will happen). Basically whenever I'm thinking about it I can pull it up and run them, or someone else with access to a CUDA GPU can. I really don't care about flakyness for that. I can afford to just run with more examples or just run the test again, given how rarely I would do it.

asmeurer · 2023-01-05T01:35:02Z

Also, as I noted, XPASS only fails if you tell it to with -o xpass_strict=true. Otherwise, xfail works just like skip, except the test is actually run so you have a chance to see if it passed or not if you look at the test summary.

honno · 2023-01-05T10:13:21Z

Flakyness is an inherent problem when using hypothesis, but I don't think it's a good excuse to throw good testing practices out the window. There are several ways it can be addressed, e.g., increasing max examples, storing the examples database persistently on CI, modifying the test itself in some way, or even just making that single test a skip (as opposed to only using skips for all tests)

To reiterate again, my argument here is yes you can deal with xfails, but it's a massive PITA if you don't know Hypothesis, and even if you do it's still a massive PITA, and even if you maintain Hypothesis it's still a massive PITA.

Yes you can just never use xpass-strict, but you still need to familiarise yourself with why you shouldn't, which makes end-user usage of the test suite annoying. It also generally is semantically wrong to xfail a test which isn't expected to always fail—hence why in Hypothesis test suite, any such test is never xfailed but just skipped.

So if you want to use this test suite as a primary test suite to rely upon as opposed to some third-party thing you run, I can see the utility of xfails. Can we not promote this in the README though? Infact I want to discourage the use of xfails, so not even document it, and leave a "please don't use this feature" comment in the relevant code. I absolutely do not want to deal with flakiness issues downstream users will have down the road heh.

README.md

.github/workflows/numpy.yml

README.md

conftest.py

asmeurer · 2023-03-13T22:21:15Z

So I've toned back the xfail stuff. You're right that there is a lot of flakyness (especially with pytorch which only seems to fail a lot of special cases tests for very specific values; probably due the way it does scalar type promotion). I do still think that actually running the test is better than skipping, but xpass isn't very usable.

In my local testing with cupy, I've come to the conclusion that xpass can work if you sufficiently populate the examples database (e.g., by running the tests with 10000 max examples). I'm hoping to figure out how to persist the examples database on CI. I know that the examples database does get invalidated sometimes when hypothesis or the test suite updates, but I expect that level of flakyness will be much less annoying.

asmeurer · 2023-03-13T22:27:59Z

Assuming the test failures go away after merging master, any objections to merging this?

honno · 2023-03-14T13:20:02Z

Pushed a commit to fix wildcard ids not matching the respective "child" ids, e.g.

$ cat skips.txt
array_api_tests/test_operators_and_elementwise_functions.py::test_abs
$ pytest array_api_tests/test_operators_and_elementwise_functions.py::test_abs -v
array_api_tests/test_operators_and_elementwise_functions.py::test_abs[abs] SKIPPED (skips file (/home/honno/gdrive/GitHub/array-...) [ 50%]
array_api_tests/test_operators_and_elementwise_functions.py::test_abs[__abs__] PASSED

honno

Think this is nearly there, mostly my personal preference now.

honno · 2023-03-14T13:40:41Z

README.md

+        export PYTHONPATH="${GITHUB_WORKSPACE}/array-api-compat"
+        cd ${GITHUB_WORKSPACE}/array-api-tests
+        pytest -v -rxXfE --ci --xfails-file ${GITHUB_WORKSPACE}/your-array-library/array-api-tests-xfails.txt array_api_tests/


Hmm, this doesn't look quite right to me, but I might be missing something. The pythonpath/github_workspace stuff looks janky for a generic example, and we shouldn't be specifying array-api-compat here right?

Also, I much prefer highlighting skips rather than xfails.

Suggested change

export PYTHONPATH="${GITHUB_WORKSPACE}/array-api-compat"

cd ${GITHUB_WORKSPACE}/array-api-tests

pytest -v -rxXfE --ci --xfails-file ${GITHUB_WORKSPACE}/your-array-library/array-api-tests-xfails.txt array_api_tests/

pytest -v -rxXfE --ci --skips-file path/to/skips.txt array_api_tests/

The "compat" thing is definitely a typo, but in general, this is a recipe for storing the skips in the array library repo and testing against the test suite. The point of GITHUB_WORKSPACE is to show how to do that (most people don't clone multiple repos in a single GitHub action, so it's not obvious how to do it). Again, I'd prefer to have a mostly working example so that people can just copy-paste it, rather than omitting stuff that they'll just have to figure out themselves. I always prefer it when people do this in other contexts which is why I've done it here.

honno · 2023-03-14T13:42:01Z

README.md

+Here is an example GitHub Actions workflow file, where the xfails are stored
+in `array-api-tests.xfails.txt` in the base of the `your-array-library` repo.
+
+If you want, you can use `-o xfail_strict=True`, which causes XPASS tests (XFAIL
+tests that actually pass) to fail the test suite. However, be aware that
+XFAILures can be flaky (see below, so this may not be a good idea unless you
+use some other mitigation of such flakyness).
+
+If you don't want this behavior, you can remove it, or use `--skips-file`
+instead of `--xfails-file`.


Much prefer highlighting --skips-file first-and-foremost, rather than --xfails-file.

We're just not going to get agreement on this. XFAILs are better than skips. Period. It's better to run the test so you can still see if it actually failed or not. The only exception is cases where the test hangs or crashes the testsuite.

For some reason the real nodeid has an extra "array-api-tests/" in front which breaks matching if we do a startswith match. "in" would also allow doing simpler matching with just the test name if desired.

asmeurer · 2023-04-18T19:57:37Z

Need this for testing the compat library against 2022.12, so going to merge.

asmeurer added 6 commits January 2, 2023 16:36

Add support for an xfails file, and allow the files to be specified w…

c6cb4fd

…ith a flag

Don't skip test_has_names with --ci

24f67f8

Be more lax about id names in xfail and skips files

21d6b9f

Also include the file name in the reason.

Move the NumPy skips on CI to a skips file

f661284

Be more generous about what lines are allowed in a skips/xfail file

153c9fa

Update the README for skips/xfails files

373d05f

asmeurer added a commit to data-apis/array-api-compat that referenced this pull request Jan 3, 2023

Use an XFAILs file on CI

6127c00

This requires data-apis/array-api-tests#157 to work.

asmeurer added a commit to data-apis/array-api-compat that referenced this pull request Jan 3, 2023

Temporarily use data-apis/array-api-tests#157 on CI

1db44cb

Fix the CI example in the README

68470c6

honno reviewed Jan 3, 2023

View reviewed changes

array_api_tests/test_has_names.py Show resolved Hide resolved

asmeurer added 7 commits February 16, 2023 16:44

Add a warning about the flakyness of XFAILS in the README

8e2bff8

Fix Markdown formatting

82cc470

Add note about max examples

234e9aa

Decrease the recommended number of max examples for CI

d7b457a

Warn if a skip or xfail from a file doesn't correspond to any tests

44c7498

Merge branch 'master' into xfails-file

78a5b20

Remove recommendation to use -o xfail_strict=True

df7ffe3

honno reviewed Feb 23, 2023

View reviewed changes

README.md Show resolved Hide resolved

.github/workflows/numpy.yml Show resolved Hide resolved

README.md Show resolved Hide resolved

conftest.py Show resolved Hide resolved

Fix skips and xfails files using ~

7c8bbad

Merge branch 'master' into xfails-file

738efc4

Fix wildcard skip/xfail ids not marking all child tests

fe878b0

honno reviewed Mar 14, 2023

View reviewed changes

asmeurer added 2 commits March 21, 2023 15:25

Fix skip and xfail id matching

1016a02

For some reason the real nodeid has an extra "array-api-tests/" in front which breaks matching if we do a startswith match. "in" would also allow doing simpler matching with just the test name if desired.

Fix a typo in the README

54c7141

asmeurer merged commit d2d43a7 into data-apis:master Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for an xfails file, and allow the files to be specified with a flag #157

Add support for an xfails file, and allow the files to be specified with a flag #157

asmeurer commented Jan 3, 2023

honno commented Jan 3, 2023

asmeurer commented Jan 5, 2023

asmeurer commented Jan 5, 2023

asmeurer commented Jan 5, 2023

honno commented Jan 5, 2023 •

edited

Loading

asmeurer commented Mar 13, 2023

asmeurer commented Mar 13, 2023

honno commented Mar 14, 2023

honno left a comment •

edited

Loading

honno Mar 14, 2023

asmeurer Mar 18, 2023

honno Mar 14, 2023 •

edited

Loading

asmeurer Mar 18, 2023

asmeurer commented Apr 18, 2023

Add support for an xfails file, and allow the files to be specified with a flag #157

Add support for an xfails file, and allow the files to be specified with a flag #157

Conversation

asmeurer commented Jan 3, 2023

honno commented Jan 3, 2023

asmeurer commented Jan 5, 2023

asmeurer commented Jan 5, 2023

asmeurer commented Jan 5, 2023

honno commented Jan 5, 2023 • edited Loading

asmeurer commented Mar 13, 2023

asmeurer commented Mar 13, 2023

honno commented Mar 14, 2023

honno left a comment • edited Loading

Choose a reason for hiding this comment

honno Mar 14, 2023

Choose a reason for hiding this comment

asmeurer Mar 18, 2023

Choose a reason for hiding this comment

honno Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

asmeurer Mar 18, 2023

Choose a reason for hiding this comment

asmeurer commented Apr 18, 2023

honno commented Jan 5, 2023 •

edited

Loading

honno left a comment •

edited

Loading

honno Mar 14, 2023 •

edited

Loading