-
Notifications
You must be signed in to change notification settings - Fork 45
Add support for an xfails file, and allow the files to be specified with a flag #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also include the file name in the reason.
This requires data-apis/array-api-tests#157 to work.
I still disagree we should support xfails. Please address all the points made in #90 (comment) |
I need this for the array api compat library. I want to be alerted on CI when an xfailing test starts passing, so that I know that some behavior is working again. I'd also ideally like to do the same for the base libraries so that I know when some piece of compat code can be removed. I'm really not worried about flakyness. If it becomes a problem, we can figure out how to fix it when we get to it. Flakyness is an inherent problem when using hypothesis, but I don't think it's a good excuse to throw good testing practices out the window. There are several ways it can be addressed, e.g., increasing max examples, storing the examples database persistently on CI, modifying the test itself in some way, or even just making that single test a skip (as opposed to only using skips for all tests). I'm still concerned about our use of skips in this repo. Right now the NumPy CI on this repo is failing anyway, but ideally, I'd like to run both numpy and numpy.array_api, either on this repo or elsewhere, with xfails. Firstly, that will alert us to changes in the library support, which is important for numpy since we are effectively in charge of that support. And secondly, the CI on this repo currently only runs against numpy.array_api with skips. That means that those skipped tests are never run on CI. If they were XFAIL they would at least run at all (XFAIL with expected failure message might be more robust here, but OTOH that might actually be too flaky to deal with). |
Also I should note that I want to do the same for cupy, which I can only run locally occasionally on my PC (unless we can figure out a CI solution there, but I really doubt that will happen). Basically whenever I'm thinking about it I can pull it up and run them, or someone else with access to a CUDA GPU can. I really don't care about flakyness for that. I can afford to just run with more examples or just run the test again, given how rarely I would do it. |
Also, as I noted, XPASS only fails if you tell it to with |
To reiterate again, my argument here is yes you can deal with xfails, but it's a massive PITA if you don't know Hypothesis, and even if you do it's still a massive PITA, and even if you maintain Hypothesis it's still a massive PITA. Yes you can just never use xpass-strict, but you still need to familiarise yourself with why you shouldn't, which makes end-user usage of the test suite annoying. It also generally is semantically wrong to xfail a test which isn't expected to always fail—hence why in Hypothesis test suite, any such test is never xfailed but just skipped. So if you want to use this test suite as a primary test suite to rely upon as opposed to some third-party thing you run, I can see the utility of xfails. Can we not promote this in the README though? Infact I want to discourage the use of xfails, so not even document it, and leave a "please don't use this feature" comment in the relevant code. I absolutely do not want to deal with flakiness issues downstream users will have down the road heh. |
So I've toned back the xfail stuff. You're right that there is a lot of flakyness (especially with pytorch which only seems to fail a lot of special cases tests for very specific values; probably due the way it does scalar type promotion). I do still think that actually running the test is better than skipping, but xpass isn't very usable. In my local testing with cupy, I've come to the conclusion that xpass can work if you sufficiently populate the examples database (e.g., by running the tests with 10000 max examples). I'm hoping to figure out how to persist the examples database on CI. I know that the examples database does get invalidated sometimes when hypothesis or the test suite updates, but I expect that level of flakyness will be much less annoying. |
Assuming the test failures go away after merging master, any objections to merging this? |
Pushed a commit to fix wildcard ids not matching the respective "child" ids, e.g. $ cat skips.txt
array_api_tests/test_operators_and_elementwise_functions.py::test_abs
$ pytest array_api_tests/test_operators_and_elementwise_functions.py::test_abs -v
array_api_tests/test_operators_and_elementwise_functions.py::test_abs[abs] SKIPPED (skips file (/home/honno/gdrive/GitHub/array-...) [ 50%]
array_api_tests/test_operators_and_elementwise_functions.py::test_abs[__abs__] PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this is nearly there, mostly my personal preference now.
README.md
Outdated
export PYTHONPATH="${GITHUB_WORKSPACE}/array-api-compat" | ||
cd ${GITHUB_WORKSPACE}/array-api-tests | ||
pytest -v -rxXfE --ci --xfails-file ${GITHUB_WORKSPACE}/your-array-library/array-api-tests-xfails.txt array_api_tests/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this doesn't look quite right to me, but I might be missing something. The pythonpath/github_workspace stuff looks janky for a generic example, and we shouldn't be specifying array-api-compat
here right?
Also, I much prefer highlighting skips rather than xfails.
export PYTHONPATH="${GITHUB_WORKSPACE}/array-api-compat" | |
cd ${GITHUB_WORKSPACE}/array-api-tests | |
pytest -v -rxXfE --ci --xfails-file ${GITHUB_WORKSPACE}/your-array-library/array-api-tests-xfails.txt array_api_tests/ | |
pytest -v -rxXfE --ci --skips-file path/to/skips.txt array_api_tests/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "compat" thing is definitely a typo, but in general, this is a recipe for storing the skips in the array library repo and testing against the test suite. The point of GITHUB_WORKSPACE is to show how to do that (most people don't clone multiple repos in a single GitHub action, so it's not obvious how to do it). Again, I'd prefer to have a mostly working example so that people can just copy-paste it, rather than omitting stuff that they'll just have to figure out themselves. I always prefer it when people do this in other contexts which is why I've done it here.
Here is an example GitHub Actions workflow file, where the xfails are stored | ||
in `array-api-tests.xfails.txt` in the base of the `your-array-library` repo. | ||
|
||
If you want, you can use `-o xfail_strict=True`, which causes XPASS tests (XFAIL | ||
tests that actually pass) to fail the test suite. However, be aware that | ||
XFAILures can be flaky (see below, so this may not be a good idea unless you | ||
use some other mitigation of such flakyness). | ||
|
||
If you don't want this behavior, you can remove it, or use `--skips-file` | ||
instead of `--xfails-file`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much prefer highlighting --skips-file
first-and-foremost, rather than --xfails-file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just not going to get agreement on this. XFAILs are better than skips. Period. It's better to run the test so you can still see if it actually failed or not. The only exception is cases where the test hangs or crashes the testsuite.
For some reason the real nodeid has an extra "array-api-tests/" in front which breaks matching if we do a startswith match. "in" would also allow doing simpler matching with just the test name if desired.
Need this for testing the compat library against 2022.12, so going to merge. |
I think the existing file for NumPy on CI needs to be updated, but I haven't looked into this. We really ought to prefer XFAIL to skips everywhere so we know when a test starts to pass. Note that XPASS doesn't actually cause a failure by default unless
-o xfail_strict=True
is used.