Skip to content

Acquiring a semaphore inside withTimeout block doesn't guarantee to release a semaphore when timed out #2233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akrikheli opened this issue Sep 8, 2020 · 2 comments
Labels
docs KDoc and API reference

Comments

@akrikheli
Copy link

Hello! Today I have discovered a strange behavior the work of withTimeout method.
I've need a simple function like tryAcquire(long timeout, TimeUnit unit) from java.util.concurrent.Semaphore but working in a non-blocking manner. I've implemented it as an extension function to kotlinx.coroutines.sync.Semaphore class and it looks this:

suspend fun Semaphore.acquirePermitToRequest(pendingTimeMs: Long): Boolean {
  return try {
    withTimeout(pendingTimeMs) {
      acquire()
      true
    }
  } catch (ce: Exception) {
    false
  }
}

When I was testing this function by code block below I encountered non-deterministic results:

fun main() = runBlocking {
  val semaphore = Semaphore(10)
  for (i in 1..1) {
    coroutineScope {
      repeat(5000) {
        launch {
          coroutineScope {
            if (semaphore.acquirePermitToRequest(25)) {
              delay(50)
              semaphore.release()
            }
          }
        }
      }
    }

    println(semaphore.availablePermits)
  }
}

Sometimes it prints 0 to my console! In other words, there're no available permits in my semaphore but 10 are expected!

I've tried to add some counters that could be useful to research this behavior, all code placed here: https://gist.github.com/akrikheli/121171b3fbca50ddad93e8b8d5b09a48

The output log when this problem is reproduced:

0
cancellation counter=4990
acquire counter=20
release counter=10
after with timeout = 10

So, the real acquisition of the semaphore was 20 times instead of the expected 10 despite the cancellation was 4990 times (5000-10) and it's a bit confusing.

As a workaround, I can rewrite my extension function using tryAcquire method in the loop.
May be anybody can explain the current behavior when acquire method used in withTimeout block?

@elizarov
Copy link
Contributor

It works as designed and I don't see how it could conceivably work in a different way.

See, the withTimeout(xxx) {....} function performs an asynchronous cancellation of the code running inside the block. It might be the case the acquire() had already completed, but then, before returning from withTimeout the time is up and it ends up throwing CancellationException.

Currently, the most portable and straightforward way to write the code you need that I can think of is:

var wasAcquired = true
withTimeoutOrNull(pendingTimeMs) {
    acquire()
    // if we are here, then the acquire call was not canceled, semaphore was acquired
    wasAcquired = true
}
return wasAcquired

Note, how using withTimeoutOrNull here saves me from writing try/catch in this code and makes it shorted. However, the result of withTimeoutOrNull would not tell at which moment inside the block execution it was canceled -- was it before semaphore acquisition or after it. To know that, we define our own wasAcquired boolean variable.

Does it help?

@elizarov elizarov added the docs KDoc and API reference label Sep 18, 2020
elizarov added a commit that referenced this issue Sep 18, 2020
This is a tricky gotcha that needs additional explanation. There are two examples added, one showing the bad code and explaining why it does not work, and the other showing the correct way to write it.

Fixes #2233
@elizarov
Copy link
Contributor

elizarov commented Sep 21, 2020

I've created PR #2252 with a new section in the coroutines guide explaining this particular aspect of withTimeout. You can read it here: https://github.com/Kotlin/kotlinx.coroutines/blob/27b6e9cf750641003753366a5ff9b9abfa481634/docs/cancellation-and-timeouts.md#asynchronous-timeout-and-resources

recheej pushed a commit to recheej/kotlinx.coroutines that referenced this issue Dec 28, 2020
…in#2252)

This is a tricky gotcha that needs additional explanation. There are two examples added, one showing the bad code and explaining why it does not work, and the other showing the correct way to write it.

Fixes Kotlin#2233
recheej pushed a commit to recheej/kotlinx.coroutines that referenced this issue Dec 28, 2020
…in#2252)

This is a tricky gotcha that needs additional explanation. There are two examples added, one showing the bad code and explaining why it does not work, and the other showing the correct way to write it.

Fixes Kotlin#2233
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs KDoc and API reference
Projects
None yet
Development

No branches or pull requests

2 participants