`runTest` doesn't report test failure if a non-cancellable coroutine happens to be hanging #3066

joffrey-bion · 2021-12-07T00:57:16Z

I'm using kotlinx-coroutines-test 1.6.0-RC.

If some non-cancellable coroutine code is hanging, exceptions or assertion errors from the test are never reported despite being reached:

@Test
fun testFailureWithPendingCoroutine() = runTest(dispatchTimeoutMs = 5000) {
    launch {
        withContext(NonCancellable) {
            awaitCancellation() // hanging non-cancellable coroutine
        }
    }
    yield()
    fail("some test failure") // this line is reached and throws, but it's never reported
}

I could see with debugging that the fail() is indeed reached and called. It does fail, but the assertion error is just swallowed somewhere in runTest and never makes it to the output. I just get the error message about the hanging coroutine:

The test coroutine was not completed after waiting for 5000 ms
kotlinx.coroutines.test.UncompletedCoroutinesError: The test coroutine was not completed after waiting for 5000 ms
	at app//kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTestCoroutine$3$3.invokeSuspend(TestBuilders.kt:231)
	(Coroutine boundary)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTestCoroutine(TestBuilders.kt:257)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTest$1$1.invokeSuspend(TestBuilders.kt:167)
	at kotlinx.coroutines.test.TestBuildersJvmKt$createTestResult$1.invokeSuspend(TestBuildersJvm.kt:13)
Caused by: kotlinx.coroutines.test.UncompletedCoroutinesError: The test coroutine was not completed after waiting for 5000 ms
	at app//kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTestCoroutine$3$3.invokeSuspend(TestBuilders.kt:231)
	at app//kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        ...

Moreover, the test spends the whole dispatchTimeoutMs before failing (this is why I reduced it to 5 seconds in the repro). It is somewhat understandable because of the non-cancellability of the coroutine, but it's quite annoying too. Not sure we could do anything about that though, at least the reporting of test failure in addition to the timeout would be great.

The text was updated successfully, but these errors were encountered:

dkhalanskyjb · 2021-12-07T09:34:11Z

Thanks for reporting this!

Some thoughts:

I think we should spend the dispatchTimeoutMs before failing. This timeout failure is a very special condition that sidesteps structured concurrency, which we want to be as rare an occurrence as possible.
It would be incorrect to just throw the exception with which the coroutine completes instead of UncompletedCoroutinesError. The exception the coroutine completes with is likely something normal and somewhat expected, whereas shutting down the test in violation of structured concurrency can lead to very strange scenarios, and UncompletedCoroutinesError is a vital piece of context for understanding these violations. Not to mention that we sweep under the rug the fact that the test took a long time to run.

    @Test
    fun testFailureWithPendingCoroutine() = runTest(dispatchTimeoutMs = 5000) {
        launch {
            withContext(NonCancellable) {
                try {
                    // we entered the `try` block...
                    awaitCancellation() // hanging non-cancellable coroutine
                } finally {
                    // ... but we never enter `finally`! Wow, this is surprising!
                }
            }
        }
        yield()
        fail("some test failure") // this line is reached and throws, but it's never reported
    }

Another consideration is that the results of testing the code should somehow map to the expected results of just running the code in production—otherwise, what's the point? So, I think this works as intended and correctly identifies the issue: in non-test code, the coroutine that is being tested would hang.

Now, we could achieve more helpful behavior in this case by, for example, adding the exception the coroutine finished with as a suppressed one. Not really straightforward though, as, after all, we want to prevent accessing the completion exception before the coroutine is completed, but it could be done.

joffrey-bion · 2021-12-07T10:37:32Z

I think we should spend the dispatchTimeoutMs before failing. This timeout failure is a very special condition that sidesteps structured concurrency, which we want to be as rare an occurrence as possible.

I was about to answer something, but when thinking about it a bit more I agree with you here (at least partly). My initial arguments were:

If there are test failures anyway, there is probably something wrong with the code or the test.
Maybe that's also the reason for hanging coroutines, so why should we wait for a full minute 
before telling the developer what happened? If there is a problem with structured concurrency
in addition to the other problem in the code, what's wrong with reporting the first issue quickly 
and when that one is fixed? The hanging coroutine issue will still fail the test if it's still there.

Why would that be any different from failing on the first assert? Once you fix the first assert that 
fails, the test will go through the other asserts. With `runTest`, I just see it as an extra (last) assert 
on whether some coroutines are still hanging after 1 minute.

But I'll answer my own questions here. The hanging coroutine might be a problem that's only visible in case of failure, so if we just report the failure, and the dev fixes it, we might miss the fact that structured concurrency is broken in the code or the test.

That being said, maybe we could have different behaviour in case of AssertionError (report immediately) and other throwables (wait the full timeout to see if the code handles coroutines properly in case of exception). This is because AssertionError should never be something failing in the production code, just something from the test itself. But this might unnecessarily complicate things, and maybe it's not right either.

It would be incorrect to just throw the exception with which the coroutine completes instead of UncompletedCoroutinesError

Yes, in case we wait for the full timeout, both exceptions should be visible. We definitely don't want to hide the UncompletedCoroutinesError.

Another consideration is that the results of testing the code should somehow map to the expected results of just running the code in production—otherwise, what's the point? So, I think this works as intended and correctly identifies the issue: in non-test code, the coroutine that is being tested would hang.

Maybe my use case is just a bit peculiar here. To give a bit more context, I'm testing the timeout behaviour of a 2-step connection operation (websocket first, then STOMP). If the connect() is cancelled during the STOMP connect, the already open websocket should be closed.

Now my close() method is suspending, and since we're already cancelled at that point I have to call it in a non-cancellable context.

My test was advancing the time artificially until the timeout and trying to assert that the connection coroutine was complete, which was not the case due to a bug in the code (the production code was indeed hanging). My test failure made it clear that the coroutine was hanging after timeout with a precise error message, but that message never surfaced in the test result, and I just got the vague exception from the test framework.

Both would have reported the hanging behaviour, but since I explicitly checked and asserted for this in the test, I expected my error to be visible in that case.

Now, we could achieve more helpful behavior in this case by, for example, adding the exception the coroutine finished with as a suppressed one.

Yes, this is exactly what I'm looking for here. I think it is important to have all failures visible because sometimes it's also critical information (on top of the hanging coroutine one).

Fixes #3066 Fixes #3069

* The error is more verbose: it lists the active children. * If the test coroutine is already finishing with an error, that error is added to the timeout notification as suppressed. * The timeout is considered more important, so now, if there were any uncaught exceptions, they don't take precedence over it but are suppressed by it. Fixes #3066 Fixes #3069

* The error is more verbose: it lists the active children. * If the test coroutine is already finishing with an error, that error is added to the timeout notification as suppressed. * The timeout is considered more important, so now, if there were any uncaught exceptions, they don't take precedence over it but are suppressed by it. Fixes Kotlin#3066 Fixes Kotlin#3069

joffrey-bion changed the title ~~runTest doesn't report test failure a non-cancellable coroutine happens to be hanging~~ runTest doesn't report test failure if a non-cancellable coroutine happens to be hanging Dec 7, 2021

qwwdfsad added test bug labels Dec 7, 2021

qwwdfsad assigned dkhalanskyjb Dec 7, 2021

dkhalanskyjb added a commit that referenced this issue Dec 8, 2021

Make UncompletedTestCoroutines more verbose

a4d5061

Fixes #3066 Fixes #3069

dkhalanskyjb mentioned this issue Dec 8, 2021

Make UncompletedTestCoroutines more verbose #3071

Merged

dkhalanskyjb closed this as completed in 17d23e6 Dec 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`runTest` doesn't report test failure if a non-cancellable coroutine happens to be hanging #3066

`runTest` doesn't report test failure if a non-cancellable coroutine happens to be hanging #3066

joffrey-bion commented Dec 7, 2021 •

edited

Loading

dkhalanskyjb commented Dec 7, 2021 •

edited

Loading

joffrey-bion commented Dec 7, 2021 •

edited

Loading

runTest doesn't report test failure if a non-cancellable coroutine happens to be hanging #3066

runTest doesn't report test failure if a non-cancellable coroutine happens to be hanging #3066

Comments

joffrey-bion commented Dec 7, 2021 • edited Loading

dkhalanskyjb commented Dec 7, 2021 • edited Loading

joffrey-bion commented Dec 7, 2021 • edited Loading

`runTest` doesn't report test failure if a non-cancellable coroutine happens to be hanging #3066

`runTest` doesn't report test failure if a non-cancellable coroutine happens to be hanging #3066

joffrey-bion commented Dec 7, 2021 •

edited

Loading

dkhalanskyjb commented Dec 7, 2021 •

edited

Loading

joffrey-bion commented Dec 7, 2021 •

edited

Loading