Skip to content

Calling CompletionStage.asDeferred() on the same future many times causes stack overflows #4156

Closed
@yzhaoa

Description

@yzhaoa

Describe the bug

Given a shared CompletableFuture object future:

  1. Attach other transformations on future, e.g. val didRun = AtomicBoolean(); future.whenComplete { _, _ -> didRun.set(true) }
  2. Call future.asDeferred() many times. Generally, 10,000 times is sufficient to trigger the failure. The resulting Deferred<*> don't have to be used.
  3. Complete future exceptionally, e.g. future.completeExceptionally(RuntimeException()).

What happened? What should have happened instead?

Actual: Some of the Deferred<*> results would not complete. Some of the extra transformations in (1) do not run.

The exception thrown from the completed Deferred<*> objects may contain a suppressed StackOverflowError. In testing, in most of the cases there is no indication that anything failed, and the StackOverflowError can be observed by setting a breakpoint inside CompletableFuture's internal machinery.

Expected: all the Deferred<*> results should complete exceptionally, and additional transforms attached in (1) should run.

Why would you do this?

The above condition can happen when you use an asynchronous cache, e.g. Caffeine.newBuilder()...buildAsync().

In such use cases, an underlying CompletableFuture-based cache would return a cached (and shared) CompletableFuture object, and code that is using the cache would call CompletionStage.asDeferred() to consume the cached future asynchronously.

In exceptional conditions, e.g. when the backend computation gated behind the cache slows to a crawl and then starts failing, the above described condition can happen - a single CompletableFuture is returned to a bunch of cache consumers while everybody is waiting for the computation to complete. If the incoming request rate is high enough, you can easily end up with 10,000 waiters on the same CompletableFuture.

Then, when backend computation finally fails, this bug is triggered.

Note: this bug is particular insidious because it is very hard to trigger normally, and when it does, caches can behave very unexpectedly.

In the case of Caffeine, because some transforms fail to run (1 above), Caffeine does not get notified that the CompletableFuture fail and keeps returning the same failed CompletableFuture to future callers.

Why not CompletionStage.await()

Because the CompletableFuture is shared between multiple callers. Using CompletionStage.await() will allow one caller's cancellation to break all other concurrent callers waiting on the same future.

Provide a Reproducer

This is a *.main.kts. Run with kotlin <something>.main.kts <num calls>.

@file:Repository("https://repo1.maven.org/maven2")
@file:DependsOn("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.8.1")

import kotlinx.coroutines.delay
import kotlinx.coroutines.future.asDeferred
import kotlinx.coroutines.runBlocking
import java.util.concurrent.CompletableFuture
import java.util.concurrent.atomic.AtomicBoolean
import kotlin.time.Duration.Companion.seconds

val numCalls = args[0].toInt()

val future = CompletableFuture<Unit>()

// Step 1
val didRun = AtomicBoolean(false)
future.whenComplete { _, _ -> didRun.set(true) }

// Step 2
val deferreds = (0 until numCalls).map { future.asDeferred() }

// Step 3
future.completeExceptionally(RuntimeException())

// There are no thread pool and executors involved above. All executions should complete on step 3.
runBlocking {
    // But just in case, wait a few seconds.
    delay(2.seconds)

    println("Expected additional transformation to run. Actual: ${didRun.get()}")
    println("Expected ${deferreds.size} Deferred<Unit> to complete exceptionally. Actual: ${deferreds.filter { it.isCompleted }.size}")
}

Versions:

  • Mac OS X Sonoma 14.4.1, M3 Pro.
  • java -version
openjdk version "21.0.3" 2024-04-16
OpenJDK Runtime Environment Homebrew (build 21.0.3)
OpenJDK 64-Bit Server VM Homebrew (build 21.0.3, mixed mode, sharing)
  • kotlin -version: Kotlin version 2.0.0-release-341 (JRE 21.0.3)
  • kotlinx-coroutines-core: 1.8.1 (above)

Results:

  • $ kotlin test.main.kts 100: no bug
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 100
Expected additional transformation to run. Actual: true
Expected 100 Deferred<Unit> to complete exceptionally. Actual: 100
  • $ kotlin test.main.kts 10000: bug, with various failure modes
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: true
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 8023
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: true
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 2970
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: true
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 9894
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: true
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 9995
$ JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.3/libexec/openjdk.jdk/Contents/Home kotlin test.main.kts 10000
Expected additional transformation to run. Actual: false
Expected 10000 Deferred<Unit> to complete exceptionally. Actual: 8075

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions