Description
Describe the bug
Given a shared CompletableFuture
object future
:
- Attach other transformations on
future
, e.g.val didRun = AtomicBoolean(); future.whenComplete { _, _ -> didRun.set(true) }
- Call
future.asDeferred()
many times. Generally, 10,000 times is sufficient to trigger the failure. The resultingDeferred<*>
don't have to be used. - Complete
future
exceptionally, e.g.future.completeExceptionally(RuntimeException())
.
What happened? What should have happened instead?
Actual: Some of the Deferred<*>
results would not complete. Some of the extra transformations in (1) do not run.
The exception thrown from the completed Deferred<*>
objects may contain a suppressed StackOverflowError
. In testing, in most of the cases there is no indication that anything failed, and the StackOverflowError can be observed by setting a breakpoint inside CompletableFuture's internal machinery.
Expected: all the Deferred<*>
results should complete exceptionally, and additional transforms attached in (1) should run.
Why would you do this?
The above condition can happen when you use an asynchronous cache, e.g. Caffeine.newBuilder()...buildAsync().
In such use cases, an underlying CompletableFuture
-based cache would return a cached (and shared) CompletableFuture
object, and code that is using the cache would call CompletionStage.asDeferred()
to consume the cached future asynchronously.
In exceptional conditions, e.g. when the backend computation gated behind the cache slows to a crawl and then starts failing, the above described condition can happen - a single CompletableFuture
is returned to a bunch of cache consumers while everybody is waiting for the computation to complete. If the incoming request rate is high enough, you can easily end up with 10,000 waiters on the same CompletableFuture
.
Then, when backend computation finally fails, this bug is triggered.
Note: this bug is particular insidious because it is very hard to trigger normally, and when it does, caches can behave very unexpectedly.
In the case of Caffeine, because some transforms fail to run (1 above), Caffeine does not get notified that the CompletableFuture
fail and keeps returning the same failed CompletableFuture
to future callers.
Why not CompletionStage.await()
Because the CompletableFuture
is shared between multiple callers. Using CompletionStage.await()
will allow one caller's cancellation to break all other concurrent callers waiting on the same future.
Provide a Reproducer
This is a *.main.kts
. Run with kotlin <something>.main.kts <num calls>
.
@file:Repository("https://repo1.maven.org/maven2")
@file:DependsOn("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.8.1")
import kotlinx.coroutines.delay
import kotlinx.coroutines.future.asDeferred
import kotlinx.coroutines.runBlocking
import java.util.concurrent.CompletableFuture
import java.util.concurrent.atomic.AtomicBoolean
import kotlin.time.Duration.Companion.seconds
val numCalls = args[0].toInt()
val future = CompletableFuture<Unit>()
// Step 1
val didRun = AtomicBoolean(false)
future.whenComplete { _, _ -> didRun.set(true) }
// Step 2
val deferreds = (0 until numCalls).map { future.asDeferred() }
// Step 3
future.completeExceptionally(RuntimeException())
// There are no thread pool and executors involved above. All executions should complete on step 3.
runBlocking {
// But just in case, wait a few seconds.
delay(2.seconds)
println("Expected additional transformation to run. Actual: ${didRun.get()}")
println("Expected ${deferreds.size} Deferred<Unit> to complete exceptionally. Actual: ${deferreds.filter { it.isCompleted }.size}")
}
Versions:
- Mac OS X Sonoma 14.4.1, M3 Pro.
- java -version
openjdk version "21.0.3" 2024-04-16
OpenJDK Runtime Environment Homebrew (build 21.0.3)
OpenJDK 64-Bit Server VM Homebrew (build 21.0.3, mixed mode, sharing)
- kotlin -version:
Kotlin version 2.0.0-release-341 (JRE 21.0.3)
- kotlinx-coroutines-core: 1.8.1 (above)
Results:
$ kotlin test.main.kts 100
: no bug
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 100
Expected additional transformation to run. Actual: true
Expected 100 Deferred<Unit> to complete exceptionally. Actual: 100
$ kotlin test.main.kts 10000
: bug, with various failure modes
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: true
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 8023
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: true
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 2970
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: true
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 9894
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: true
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 9995
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: false
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 8075