Skip to content

runTest doesn't report test failure if a non-cancellable coroutine happens to be hanging #3066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joffrey-bion opened this issue Dec 7, 2021 · 2 comments
Assignees

Comments

@joffrey-bion
Copy link
Contributor

joffrey-bion commented Dec 7, 2021

I'm using kotlinx-coroutines-test 1.6.0-RC.

If some non-cancellable coroutine code is hanging, exceptions or assertion errors from the test are never reported despite being reached:

@Test
fun testFailureWithPendingCoroutine() = runTest(dispatchTimeoutMs = 5000) {
    launch {
        withContext(NonCancellable) {
            awaitCancellation() // hanging non-cancellable coroutine
        }
    }
    yield()
    fail("some test failure") // this line is reached and throws, but it's never reported
}

I could see with debugging that the fail() is indeed reached and called. It does fail, but the assertion error is just swallowed somewhere in runTest and never makes it to the output. I just get the error message about the hanging coroutine:

The test coroutine was not completed after waiting for 5000 ms
kotlinx.coroutines.test.UncompletedCoroutinesError: The test coroutine was not completed after waiting for 5000 ms
	at app//kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTestCoroutine$3$3.invokeSuspend(TestBuilders.kt:231)
	(Coroutine boundary)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTestCoroutine(TestBuilders.kt:257)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTest$1$1.invokeSuspend(TestBuilders.kt:167)
	at kotlinx.coroutines.test.TestBuildersJvmKt$createTestResult$1.invokeSuspend(TestBuildersJvm.kt:13)
Caused by: kotlinx.coroutines.test.UncompletedCoroutinesError: The test coroutine was not completed after waiting for 5000 ms
	at app//kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTestCoroutine$3$3.invokeSuspend(TestBuilders.kt:231)
	at app//kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        ...

Moreover, the test spends the whole dispatchTimeoutMs before failing (this is why I reduced it to 5 seconds in the repro). It is somewhat understandable because of the non-cancellability of the coroutine, but it's quite annoying too. Not sure we could do anything about that though, at least the reporting of test failure in addition to the timeout would be great.

@joffrey-bion joffrey-bion changed the title runTest doesn't report test failure a non-cancellable coroutine happens to be hanging runTest doesn't report test failure if a non-cancellable coroutine happens to be hanging Dec 7, 2021
@dkhalanskyjb
Copy link
Collaborator

dkhalanskyjb commented Dec 7, 2021

Thanks for reporting this!

Some thoughts:

  • I think we should spend the dispatchTimeoutMs before failing. This timeout failure is a very special condition that sidesteps structured concurrency, which we want to be as rare an occurrence as possible.
  • It would be incorrect to just throw the exception with which the coroutine completes instead of UncompletedCoroutinesError. The exception the coroutine completes with is likely something normal and somewhat expected, whereas shutting down the test in violation of structured concurrency can lead to very strange scenarios, and UncompletedCoroutinesError is a vital piece of context for understanding these violations. Not to mention that we sweep under the rug the fact that the test took a long time to run.
    @Test
    fun testFailureWithPendingCoroutine() = runTest(dispatchTimeoutMs = 5000) {
        launch {
            withContext(NonCancellable) {
                try {
                    // we entered the `try` block...
                    awaitCancellation() // hanging non-cancellable coroutine
                } finally {
                    // ... but we never enter `finally`! Wow, this is surprising!
                }
            }
        }
        yield()
        fail("some test failure") // this line is reached and throws, but it's never reported
    }
  • Another consideration is that the results of testing the code should somehow map to the expected results of just running the code in production—otherwise, what's the point? So, I think this works as intended and correctly identifies the issue: in non-test code, the coroutine that is being tested would hang.

Now, we could achieve more helpful behavior in this case by, for example, adding the exception the coroutine finished with as a suppressed one. Not really straightforward though, as, after all, we want to prevent accessing the completion exception before the coroutine is completed, but it could be done.

@joffrey-bion
Copy link
Contributor Author

joffrey-bion commented Dec 7, 2021

I think we should spend the dispatchTimeoutMs before failing. This timeout failure is a very special condition that sidesteps structured concurrency, which we want to be as rare an occurrence as possible.

I was about to answer something, but when thinking about it a bit more I agree with you here (at least partly). My initial arguments were:

If there are test failures anyway, there is probably something wrong with the code or the test.
Maybe that's also the reason for hanging coroutines, so why should we wait for a full minute 
before telling the developer what happened? If there is a problem with structured concurrency
in addition to the other problem in the code, what's wrong with reporting the first issue quickly 
and when that one is fixed? The hanging coroutine issue will still fail the test if it's still there.

Why would that be any different from failing on the first assert? Once you fix the first assert that 
fails, the test will go through the other asserts. With `runTest`, I just see it as an extra (last) assert 
on whether some coroutines are still hanging after 1 minute.

But I'll answer my own questions here. The hanging coroutine might be a problem that's only visible in case of failure, so if we just report the failure, and the dev fixes it, we might miss the fact that structured concurrency is broken in the code or the test.

That being said, maybe we could have different behaviour in case of AssertionError (report immediately) and other throwables (wait the full timeout to see if the code handles coroutines properly in case of exception). This is because AssertionError should never be something failing in the production code, just something from the test itself. But this might unnecessarily complicate things, and maybe it's not right either.

It would be incorrect to just throw the exception with which the coroutine completes instead of UncompletedCoroutinesError

Yes, in case we wait for the full timeout, both exceptions should be visible. We definitely don't want to hide the UncompletedCoroutinesError.

Another consideration is that the results of testing the code should somehow map to the expected results of just running the code in production—otherwise, what's the point? So, I think this works as intended and correctly identifies the issue: in non-test code, the coroutine that is being tested would hang.

Maybe my use case is just a bit peculiar here. To give a bit more context, I'm testing the timeout behaviour of a 2-step connection operation (websocket first, then STOMP). If the connect() is cancelled during the STOMP connect, the already open websocket should be closed.

Now my close() method is suspending, and since we're already cancelled at that point I have to call it in a non-cancellable context.

My test was advancing the time artificially until the timeout and trying to assert that the connection coroutine was complete, which was not the case due to a bug in the code (the production code was indeed hanging). My test failure made it clear that the coroutine was hanging after timeout with a precise error message, but that message never surfaced in the test result, and I just got the vague exception from the test framework.

Both would have reported the hanging behaviour, but since I explicitly checked and asserted for this in the test, I expected my error to be visible in that case.

Now, we could achieve more helpful behavior in this case by, for example, adding the exception the coroutine finished with as a suppressed one.

Yes, this is exactly what I'm looking for here. I think it is important to have all failures visible because sometimes it's also critical information (on top of the hanging coroutine one).

dkhalanskyjb added a commit that referenced this issue Dec 8, 2021
woainikk pushed a commit that referenced this issue Dec 14, 2021
* The error is more verbose: it lists the active children.
* If the test coroutine is already finishing with an error, that error
  is added to the timeout notification as suppressed.
* The timeout is considered more important, so now, if there were
  any uncaught exceptions, they don't take precedence over it but
  are suppressed by it.

Fixes #3066
Fixes #3069
yorickhenning pushed a commit to yorickhenning/kotlinx.coroutines that referenced this issue Jan 28, 2022
* The error is more verbose: it lists the active children.
* If the test coroutine is already finishing with an error, that error
  is added to the timeout notification as suppressed.
* The timeout is considered more important, so now, if there were
  any uncaught exceptions, they don't take precedence over it but
  are suppressed by it.

Fixes Kotlin#3066
Fixes Kotlin#3069
pablobaxter pushed a commit to pablobaxter/kotlinx.coroutines that referenced this issue Sep 14, 2022
* The error is more verbose: it lists the active children.
* If the test coroutine is already finishing with an error, that error
  is added to the timeout notification as suppressed.
* The timeout is considered more important, so now, if there were
  any uncaught exceptions, they don't take precedence over it but
  are suppressed by it.

Fixes Kotlin#3066
Fixes Kotlin#3069
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants