-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Discourage manual execution control in the test framework #3919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Leave a thumbs-up under this comment if |
Leave a thumbs-up under this comment if |
About For example, I am writing a web socket library, and the received frames are represented as flows. Depending on implementation details, it could be that if the caller doesn't collect the flow at the right time, some frames could be dropped (it used to be the case in the past). I want to test that behaviour, and determine that internal coroutines don't drop the frame in the case where no collector is there, even if given infinite time. Here is some pseudocode: @Test
fun subscribe_doesntLoseMessagesIfFlowIsNotCollectedImmediately() = runTest {
val wsSession = createSomeWsSession(...) // passes the current context so that internal coroutines use the test scheduler
// some calls that lead to the generation of web socket frames somewhere in the pipe
wsSession.triggerSomeWebSocketFrame()
// no flow collection started here
advanceUntilIdle() // makes sure internal coroutines cannot drop the frame just due to races
ensureWeCanStillGetTheFrameNowByCollectingLate()
} Here What would be the suggestion in this case? |
Thank you for the use case! I get that you want to wait for the moment your system reaches a quiescence period to ensure that it doesn't enter an incorrect state even when no one interacts with it for too long, and during quiescence, not all tasks are finished, as some are suspended. There are still some technical details that are unclear, though, could you elaborate on them? Maybe the questions below are a good checklist for gathering
Also, a question regarding your system specifically: I see a comment about races—wouldn't you be better off writing a multithreaded stress-test? Roughly, it could look like this: @Test
fun subscribe_doesntLoseMessagesIfFlowIsNotCollectedImmediately() = runTest {
repeat(100_000) {
coroutineScope {
val wsSession = createSomeWsSession(...) // passes the current context so that internal coroutines use the test scheduler
launch(Dispatchers.Default) {
ensureWeCanStillGetTheFrameNowByCollectingLate()
}
wsSession.triggerSomeWebSocketFrame()
}
}
} There are some synchronization tricks you could do to make sure the interesting thread interleavings are more likely to happen, but let's keep it simple for now. This way, you would actually race some code against some other code. With deterministic single-threaded testing, you only look at a single interleaving. |
Thanks a lot for your answer.
Not really. The point of this specific test is to force the system to fail in a specific case (frames are emitted while no collectors are there to get them for a long time, and then new collectors should still get them). I want the test to be guaranteed to fail if someone changes the implementation to, say, a shared flow that would just drop frames while no one listens. But I can't just check whether the frame is here immediately, because there could be suspension points in between that are such that the frame is still here for a bit and then disappears. Therefore I can't rely on a specific number of suspension points (and thus some number of I hope this clarifies my case. |
I'm just wondering, what would be the main difference between |
The main difference is that |
Thanks for the explanation. Then yeah, Problem with other alternatives comes mostly down to time-sensitive code and nested coroutines. Let's say that we have code that sets state X immediately and then state Y after one second.
Then, ideally, we should be able to write it like this: launchCoroutineThatTriggersXAndY()
assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY() However, since X and Y are triggered in the background coroutine, nothing will happen. We must tell the test dispatcher to execute other coroutines, before we can assert. In some cases, simple launchCoroutineThatTriggersXAndY()
yield()
assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY() however, if launchCoroutineThatTriggersXAndY()
runCurrent()
assertThatWeAreInStateX()
delay(1.second)
assertThatWeAreInStateY()
To us |
As far as I understand it's implementation, class ViewModel(someFlow: Flow<String>, coroutineContext: CoroutineContext){
private val coroutineScope:CoroutineScope= coroutineScopeWithLifecycle(coroutineContext)
val someState=someFlow.stateIn(coroutineScope, SharingStarted.WhileSubscribed(), null)
fun onButtonClick(){
coroutineScope.launch {
doStuff()
}
}
} When testing |
Reviewing our codebases, almost all uses of |
Thanks for this detailed write up, this was very helpful in getting my tests working. I tried My use case is I have a Personally, I think the API might be a little bit more clear if all tests could use |
Update: the // Poll until the data is expired
suspend fun doRefreshAfterExpires(refreshAt: Instant) {
while(true) {
delay(expireCheckIntervalMs)
if (refreshAt < clock.instant()) {
mutex.withLock {
doLoadAttemptLocked()
}
break
}
}
}
// Kicks off a new load job, must be run inside a lock call.
suspend fun doLoadAttemptLocked() {
status = AsyncStatus.LOADING
loadJob = scope.launch(defaultDispatcher) {
val result = onLoad()
mutex.withLock {
if (result.isSuccess) {
status = AsyncStatus.SUCCESS
refreshAt = calculateRefreshAt(result.expiresAt)
if (refreshAt != null) {
expireCheckJob = scope.launch(defaultDispatcher) {
doRefreshAfterExpires(refreshAt = newWillRefreshAt)
}
}
} else { /* Error handling */ }
emitState()
}
}
emitState()
} |
What are you trying to test exactly? If your goal is to test that a single request is started, what you would do is: fun yourTest() = runTest {
launch { doRefreshAtExpire(…) }
// Test that no request has been made yet
delay(100)
// The specified instant hasn't been reached yet
delay(…) // let's say that after that, the instant has been reached
// Test that a first request has been started
delay(expireCheckInternaMs)
// Test that a second request has been started
// etc
} The point is, you replicate the delays in the test function itself to measure what happens just before and after each operation. Delay-skipping keeps the order of events, so it will only skip after all events scheduled for that time are over. |
So my test scenario is to make sure my DataLoader reloads the data when it expires:
My onLoad = {
Timber.i("before load delay")
delay(loadDelay)
Timber.i("after load delay")
DataLoaderResult.Success(data = 1, expiresAt = Instant.EPOCH.plusSeconds(5))
}, The problem is that val loadResultCh = Channel<DataLoaderResult<Int>>(1)
val loader = DataLoader<Int>(
onLoad = {
loadResultCh.receive()
},
}
// Load and wait for it to succeed
loader.load()
runCurrent()
Assert.assertEquals(loadState, loader.state.value)
loadResultCh.send(DataLoaderResult.Success(data = 1, expiresAt = expiresAt))
runCurrent()
Assert.assertEquals(successState, loader.state.value) |
Update: I've figured this out: long story short,
Thanks for the help taking a look at this, I'm going to plan on using |
…ntilIdle may be removed in the future See Kotlin/kotlinx.coroutines#3919 Closes https://gitlab.com/opensavvy/groundwork/prepared/-/issues/51
The KotlinX.Coroutines team is considering removing advanceBy. Since our main use-case is to set the initial time in a test, which happens before any tasks are registered, the behavior is the same if using a regular delay instead. See Kotlin/kotlinx.coroutines#3919
The KotlinX.Coroutines team is considering removing advanceBy. Since our main use-case is to set the initial time in a test, which happens before any tasks are registered, the behavior is the same if using a regular delay instead. See Kotlin/kotlinx.coroutines#3919
Problem statement, the short version
We receive a lot of questions about using the test module, the answer to which is to avoid using
advanceUntilIdle
/advanceTimeBy
.advanceTimeBy
,advanceUntilIdle
, andrunCurrent
can be replaced bydelay
,yield
, or cross-coroutine communication in most tests, with better results and much clearer semantics. BecauserunTest
uses virtual time, evendelay(1.seconds)
is instantaneous. Manual time control is error-prone, discouraged, has surprising behavior, and is increasingly incompatible with the rest of the framework.We should think of ways to make these functions less prominent so that people don't use them unless they absolutely need to, providing alternatives that more directly and correctly express the intent behind their typical usage.
Request for comments: if you use
advanceTimeBy
,runCurrent
, oradvanceUntilIdle
in your code and don't think they could be reasonably replaced with the suggested alternatives when we implement them, please provide examples of your use cases.Alternatives to each manual execution control function
advanceTimeBy
This function is used when some specific event that's known to be scheduled into the future needs to happen before its results are validated.
Examples:
When you have
advanceTimeBy(n)
directly in your test (which is almost all of its usages), it can be replaced withdelay(n-1.milliseconds)
.-1
is needed becauseadvanceTimeBy
does not execute the tasks scheduled for the momentcurrentTime() + n
, whereasdelay(n)
will also execute the tasks that were already scheduled forcurrentTime() + n
beforedelay(n)
was called.So, these tests become
advanceUntilIdle
This function has the behavior of running the tasks present in the test scheduler, and only there, while there are tasks outside of
backgroundScope
. Almost always, the upper bound on the virtual time it takes to execute all code in the test dispatcher is known. Typically, it's just1.milliseconds
. So, functionally,advanceUntilIdle
can almost always be replaced bydelay(20.hours)
.However, in many cases, there are problems with this replacement:
20.hours
is a completely arbitrary constant.advanceUntilIdle
doesn't always do what it says on the box, as work on other dispatchers is not taken into account, so idleness is a fuzzy concept anyway. Though it's inarguable that whenadvanceUntilIdle
does work, it also shows the intent.backgroundScope
that happens, say, every100.milliseconds
,delay(20.hours)
will waste a lot of processing power on that work.We must first present a viable alternative to confidently claim that
advanceUntilIdle
is no longer needed.In most cases we've seen, what people intend is actually for the spawned coroutines to finish, regardless of whether these coroutines execute on the test dispatcher. So, we could introduce a function like
Replacing
advanceUntilIdle
withawaitAllChildren()
wouldn't always work. For example, here, it would lead to a deadlock:However, in all the cases we've seen so far where this happens,
advanceUntilIdle
is not needed at all and was probably put there just in case. We'll need to investigate further.runCurrent
As shown in the previous example,
delay(1.milliseconds)
usually does the trick.Still, this replacement is problematic:
runCurrent
, asrunCurrent
ensures no more tasks are scheduled for the current moment by running all of them.A proper replacement would probably be a function like
This is also needed to properly support using Espresso with the test framework #242
Problem statement, the long version
The current understanding of how people (need to) test coroutines
Before the large test framework refactoring, there were effectively two APIs: a low-level one (
TestCoroutineScope
,runCurrent
,advanceUntilIdle
,advanceTimeBy
) and a higher-level one,runBlockingTest
, built on top of it.There were three patterns for testing coroutines.
Library vs framework
It takes an "off from the side" look at coroutines: the testing facilities were treated as a library of functions to be mixed and matched, with behavior being requested from them. One would explicitly schedule work on the test facilities, explicitly ask for it to be executed, and explicitly ensure the correct behavior.
Here, testing is conducted from inside a coroutine.
runBlockingTest
itself calledadvanceUntilIdle
andcleanupTestCoroutines
, ensuring the correct behavior in the common case, but giving up some of the flexibility.runBlockingTest
was a framework for coroutines to be tested in.runBlockingTest
was itself implemented on top of the primitives provided by the library-like interface.After the big refactoring, where every coroutine test must be wrapped in
runTest
, tests that used to be in the library form took this approach instead of translating to the pure framework-like one. What follows is an explanation of why both the pure library approach and the mixed approach are suboptimal for the common use cases and one should relegate the execution control torunTest
.The issues with manual execution control
At a glance, it may look like the library approach is clearly the better one: everything is explicit, with minimal magical behavior that needs to be understood, and one can engage with the scheduled coroutines as closely as needed, whereas
runBlockingTest
just does its thing somehow on its own in mysterious ways.Unfortunately, careful study of hundreds of tests has shown that what people need is a framework for testing, not a library.
Tests not testing anything
People were misusing the testing library all the time. For example, consider this "test":
This test runs
foo
until the first suspension and then stops doing anything at all.assertEquals
will never get called.The root of the issue is that it's just incorrect to create a
TestCoroutineScope
and not callcleanupTestCoroutines
at some point. This test is more than useless: it gives a false sense of security, leaving the impression thatfoo
is properly tested.No interoperating with asynchronous resumptions
Problem statement
Manual time control only knows about tasks that the test dispatcher needs to execute. It doesn't know anything about tasks that happen on other dispatchers.
advanceUntilIdle
here will progress the first launched coroutine until the point whenfoo
is entered. After that, the test dispatcher has no control over what happens,Dispatchers.IO
has to take it from there.So, by the time the second
launch
happens, a network call may or may have not been executed. Most likely, it wasn't. There are race conditions in this test, making it flaky and unpredictable.Luckily, most tests were not written this way: most likely, the network call wouldn't have enough time to finish by the end of the test, and
runBlockingTest
would crash with an exception notifying that some coroutines spawned inside it didn't complete. This behavior was mysterious to many, as understanding it requires a solid model of the internals of coroutines, and so people were just generally discouraged from using non-test dispatchers inside tests.Add to it the previous problem of people not even using
runBlockingTest
and forgetting to clean up coroutines, and you get a recipe for disaster: even if you didn't forgetadvanceUntilIdle
at the end of the test, it still meant little. Some adventurous souls went out of their way to make tests pass, rewriting tests not to userunBlockingTest
orcleanupTestCoroutines
as they "caused the test to crash."A partial fix
The library approach can be made to properly work with external dispatches like this:
This way, until the last line of the test body has finished executing (after which
job.isActive
becomesfalse
), we run all the tasks scheduled in the test dispatcher, and if we're not done after that, also wait for100.milliseconds
for some other dispatchers to asynchronously pass more tasks to the test dispatcher.However, note that the only part of the test relevant to us is between lines
A
andB
. All the other things are simply boilerplate that virtually every test needs, but is very difficult to come up with.Manual execution control inherently can't support asynchronous work in general
All of the above is a good indication that manual execution control is error-prone in general in the case of asynchronous resumptions: every
advanceUntilIdle
must be replaced with code between linesC
andD
in order to properly support them.The problem is, the role of
job
is not always obvious. For example, consider this test:If one attempted to replace
advanceUntilIdle
here with theC
-D
construct, there would be no one to fulfill the role ofjob
.bar
does calllaunch
, but does it as an implementation detail, without exposing the resultingJob
. CallingadvanceUntilIdle
here is simply useless, even though it doesn't seem this way at a glance: from the point of view of the test framework, the system is idle while non-test dispatchers are busy.Replacing
advanceUntilIdle
withdelay(2.seconds)
would not fix the behavior in this case, but it would still clear up the conceptual model. Since, inside the test dispatcher, delays are instantaneous, from its point of view, waiting forDispatchers.IO
to finish doing its thing requires infinite time. The problem of "I don't know when asynchronous work will be completed" needs to be solved programmatically somehow: you can't test asynchronous work if you don't know when it finishes, but the false concept of system idleness significantly muddies the mental model behind the testing framework.Interactions between manual execution control and real-time require thread blocking
Another issue stems from the fact that manual execution control functions are blocking and can't suspend.
For an example of where this causes problems, see #3179. We have a reasonable request to disable virtual time in parts of tests. However, the current design of manual execution control prevents us from reaching a clear solution.
What virtual time should be printed? In what order should the numbers be printed?
There are seemingly two viable answers:
1.seconds
, the order should be0
(in half a second),1
(in a second since the start of the test), and then2
(also in a second since the start of the test).It's unlikely that someone would expect
job1
to complete afterjob2
just because it's required to use the real-world time forjob1
, as the abstraction behind virtual time is that it is just like the actual time, but for tests, it's passing infinitely faster.The next question is, what should happen if line
A
is replaced withadvanceUntilIdle
oradvanceTimeBy(2.seconds)
. Neither of them is allowed to wait for one real-life second, as they are blocking functions. They would have to block the thread during waiting, which is not even expressible in the JS implementation of the test framework, but would be a strange behavior in any case. In any case, after half a second of waiting, their blocking would also need to be interrupted, as a new task arrived to the test scheduler, ready for execution.All of this could be easily mitigated if
advanceUntilIdle
andadvanceTimeBy
weresuspend
functions: they would just suspend, freeing the thread to do other things until the real-time wait is over or a new task arrives. This is exactly the behavior that writingdelay(2.seconds)
instead ofadvanceTimeBy
has out of the box!There is no way to make time control functions play nicely in these scenarios. It seems like the only sensible way to implement them would be to throw
UnsupporteOperationException
if they encounter a task that requires waiting for a given amount of real time.Most tests don't need this kind of complex API
Out of the hundreds of tests we've seen during the research, there was a single-digit number of them that did use manual execution control in a way that the framework style wasn't better suited for.
When implementing low-level libraries over the coroutines (for example, when writing your own schedulers), one may need to validate the behavior in cases of interleavings that are difficult to emulate with just
runBlockingTest
, mocking calls, and callingdelay
. The majority of tests were worse off not usingrunBlockingTest
than otherwise, being more brittle.Why did we keep manual execution control after the refactoring if it's so problematic?
There are two reasons:
runCurrent
oradvanceTimeBy
from outside the test body, where they can't be replaced with adelay
.The text was updated successfully, but these errors were encountered: