Skip to content

Replace TimeoutCancellationException with TimeoutException #1374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
qwwdfsad opened this issue Jul 24, 2019 · 18 comments
Open

Replace TimeoutCancellationException with TimeoutException #1374

qwwdfsad opened this issue Jul 24, 2019 · 18 comments

Comments

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Jul 24, 2019

In the current version of the library, withTimeout throws TimeoutCancellationException when a timeout was exceeded.

It can lead to very subtle errors, for example:

launch {
    val result = withTimeout(...) {
        // ... some computation ...
    }
    // process result
}

TimeoutCancellationException is CancellationException, thus is never reported.
But in the snippet above, it's likely to be a programmatic error. If it is expected to miss the deadline, then withTimeoutOrNull should be used explicitly.

My proposal is deprecation of TimeoutCancellationException and replacement with TimeoutException that is not CancellationException

@artbez
Copy link

artbez commented Oct 15, 2020

Hi, @qwwdfsad !
My team and I also faced this problem. Do you have any expectations about the time this task will be solved?

@qwwdfsad
Copy link
Collaborator Author

Could you please elaborate on the problem you've faced?

@artbez
Copy link

artbez commented Oct 16, 2020

Sure, we use withTimeout for restricting our inner functions with deadlines. So, any exceptions except CancellationException will be reported from our business logic code, but TimeoutCancellationException won’t. We want to parse such exceptions as others. And, as you said before, it would be possible if withTimeout threw TimeoutException instead of TimeoutCancellationException. Another case is when we use launch within coroutineScope. Exceded deadline in one job doesn't cancel other jobs. Of course, we can use our own custom launch which catches TimeoutCancellationException inside and throws TimeoutException or we can use withTimeoutOrNull. But, we'd like to know if you plan to change the behavior of withTimeout function.

Thank you!

@qwwdfsad
Copy link
Collaborator Author

qwwdfsad commented Nov 11, 2021

The preliminary design decision is to deprecate withTimeout function with @LowPriorityInOverloadResolution and re-introduce it in the brand new package kotlinx.coroutines.time with the same name.

The change is now blocked by the IDE bug that doesn't replace a signature to the same name in a different package

@dkhalanskyjb
Copy link
Collaborator

This could be a good time to also avoid this behavior #1914 by checking at the end of the block if the coroutine was canceled and throwing in that case.

@qwwdfsad
Copy link
Collaborator Author

qwwdfsad commented Jan 3, 2023

The same applies to the cancellation before withTimeout even start (that is also observed in #1914).

Historically, it was on par with withContext behaviour. Though the latter actively evolved (#962, #785, then #791), while withTimeout has been left as is. Its behaviour is clearly lagging behind what is expected from it by default and is worth changing.

It seems like we already have three potential changes in the function semantics:

  • Exception type (more important, it's CE -> non-CE)
  • Non-atomic cancellation (it also will help to get rid of all the hacks in our undispatchedResult and startUndispatchedOrReturnIgnoreTimeout) on completion
  • Avoid entrance into the body of withTimeout if timeout already kicked in before the coroutine starts

@qwwdfsad
Copy link
Collaborator Author

qwwdfsad commented Jan 3, 2023

It can be the case that the behaviour is too breaking and thus we should come up with a better name instead of silently shadowing the original signature.
What concerns me is the fact that timeouts are non-deterministic and are likely to happen under the load or if things go south, meaning that users are likely to learn about this change from production crashes rather than regular testing

@jQrgen
Copy link

jQrgen commented Mar 1, 2023

@wkornewald
Copy link

Relevant: https://betterprogramming.pub/the-silent-killer-thats-crashing-your-coroutines-9171d1e8f79b

The article correctly describes some root causes of the problem, i.e. CancellationException being used in places where an error-indicating exception would be needed instead.

However the article's proposed solution is to treat all abortable cancellations as errors (i.e. where the job wasn't canceled). This can't be correct. Just because >90% of CancellationExceptions use Job cancellation doesn't mean you can turn all other cancellations into errors. Ignoring those abortable cases with "but you shouldn't be catching errors there anyway" or "So far, I haven’t found any major issues" is a shaky argument (and in our codebase it would quickly hit issues). The article's solution is a workaround that breaks fundamental "safe bubbling" behavior of cancellations where you don't want to trigger error handling (like AbortFlowException).

The real solution is to fix the incorrect usages of CancellationException or at least provide alternative APIs where breaking changes wouldn't be acceptable. The bubbling behavior should not be broken.

@jQrgen
Copy link

jQrgen commented Mar 6, 2023

@wkornewald, thanks for the clarification.

@joffrey-bion
Copy link
Contributor

Is there a similar plan to "fix" Deferred.await() / SendChannel.send() / ReceiveChannel.receive() in the same way? i.e. only throw CancellationException when the calling coroutine is cancelled, but NOT in situations where the producer/consumer on the other end is cancelled (in which case some other exception that's not a subtype of CancellationException would be more appropriate, in order to avoid cancelling the calling coroutine silently)

@dkhalanskyjb
Copy link
Collaborator

We are yet to decide what to do about this. See #3658

@Legion2
Copy link
Contributor

Legion2 commented Apr 27, 2023

Also facing this inconsistency, as a workaround I now use withTimeoutOrNull and throw an exception on null value. Took my a long time to figure out why my code got silently canceled instead of getting a real exception which explains the timeout.

@masc3d
Copy link

masc3d commented Jul 5, 2023

this will also happen when combining flow operators timeout and merge. very confusing and difficult to track down in complex flows.

kostya05983 pushed a commit to kinfra/kinfra-commons that referenced this issue Feb 17, 2024
caiodev added a commit to caiodev/GithubProfileSearcher that referenced this issue Dec 22, 2024
…tionException is thrown out of the catch block or the ViewModel is told to cancel its scope as soon as it is detected/thrown, code won't be able to return any kind of error to upper layers and that will lead to gray area where the app goes to a zombie state and UI will "forever" wait for a result that never comes

The information that lead to this second change is contained in this article: https://betterprogramming.pub/the-silent-killer-thats-crashing-your-coroutines-9171d1e8f79b

The point is that the examples shown in the article go for a more generic approach and do not reflect how Coroutine scope generation works on Android where the Scope owner is not in most cases in the same file where a CancellationException will likely be thrown so devs have to find a way around in order to tell the Scope owner which usually is ViewModel to check if its scope is still active or not

The way I found to be the better one in my specific case is to instead of letting CancellationExceptions out of catch blocks or warn ViewModels to check on their Coroutine scopes and possibly cancel them if it is found they are not active anymore leading the app to a "Zombie" state, is to check the scope status at the end of every operation.

This way, the UI is warned beforehand and if indeed the scope in question is not active anymore, it is cancelled before another operation is performed under it.

According to Roman Elizarov's comments in the open issue for this matter, it will probably never be fixed due to its high complexity where a new implementation of Coroutines from scratch would almost certainly be required to fix it, which will most likey never happen. So, there's that. Let's enjoy Coroutines as it is :)

More info on it can be found here and I highly recommend you take a look at it: Kotlin/kotlinx.coroutines#1374

Move away from the "By feature + By layer" modularization approach in favor of the "By feature" only approach

Create domain module

Create di module

Create resources module

Create utils module

Delete unused XML layouts

Delete unused ViewHolders

Create specific modules for JVM and Instrumented testing so one cannot access the other's dependencies

Rename midfield module to domain

Convert all module-level build.gradle files to KTS

Remove targetSdk field from build.gradle files as it is now deprecated

Add Koin lazy modules feature

Add Koin startup feature

Migrate to Kotlin 2.1.0

Migrate from packagingOption to packaging as it is now deprecated

Migrate default build.gradle JVM Versioning approach to jvmToolchain

Upgrade to Coil3

Upgrade to Ktor 3.0

Upgrade Gradle to 8.12

Refactor layout ids

Disable "allowBackup" manifest option as database changes will always require migrations even if all app data is removed through the app details screen

Remove empty/unused manifest files

Split ProfileAggregator into individual UseCases

Refactor gradlew script

Add "orderingId" field to Profile table so now when data is retrieved from Database it accurately reflects how data is ordered in Github API since it does not return individual profile scores anymore

Refactor libs.versions.toml

Re-add INSERT to available SQL commands

Refactor the way remote repositories propagate their results so now there is a callback for each type of result and there is no need to check for success or failure more than once

Move ResultHandler to Data layer

Change HTTP codes inside RemoteFetcher as search quota reached error code is not 451 anymore but 403 (Forbidden)

Add checks to UIState so state checking in the UI is more concise

Update dependency versions
@scherrsasrf
Copy link

Just wasted 4 hours tracking down jobs that were cancelled on timeout but didn't throw an exception that could be logged. We are now using something like this:

class TimeoutException(cause: Exception) : Exception(cause)

suspend fun <T> withDeadline(timeout: Duration, block: suspend () -> T): T = try {
    withTimeout(timeout) { block() }
} catch (e: TimeoutCancellationException) {
    throw TimeoutException(e)
}

@dkhalanskyjb dkhalanskyjb mentioned this issue Jan 23, 2025
@kevincianfarini
Copy link
Contributor

I understand that the maintainers are working on how to properly fix this design flaw, but in the meantime, can we mark the regular withTimeout function as requiring opt in? Acknowledging the design flaw in a way that's highly visible will help prevent future uses of withTimeout when the recommended solution for now is to use withTimeoutOrNull and throw a custom exception.

@eygraber
Copy link

I knew about this issue and still got bit by it.

@qwwdfsad
Copy link
Collaborator Author

Here goes the discussion for the potential solution: #4356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests