Skip to content

withContext(IO) blocks thread #524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
phiSgr opened this issue Aug 31, 2018 · 6 comments
Closed

withContext(IO) blocks thread #524

phiSgr opened this issue Aug 31, 2018 · 6 comments

Comments

@phiSgr
Copy link

phiSgr commented Aug 31, 2018

Currently we are using Vert.x web for HTTP server, and uses JedisPool for Redis client.
Since that Redis client has a blocking API, we used withContext(IO) to not block the Vert.x thread.

suspend fun <R> JedisPool.useAsync(block: (Jedis) -> R) = withContext(IO) {
    this.resource.use { block(it) }
}

But the Vert.x thread is blocked by withContext.
Coroutines version is "0.25.0",
Vertx version is "3.5.3".

Aug 31, 2018 3:24:53 AM io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 39234941 ms, time limit is 2000
io.vertx.core.VertxException: Thread blocked
	at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.parkedWorkersStackPop(CoroutineScheduler.kt:982)
	at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.tryUnpark(CoroutineScheduler.kt:395)
	at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.requestCpuWorker(CoroutineScheduler.kt:366)
	at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.dispatch(CoroutineScheduler.kt:345)
	at kotlinx.coroutines.experimental.scheduling.ExperimentalCoroutineDispatcher.dispatchWithContext$kotlinx_coroutines_core(ExperimentalCoroutineDispatcher.kt:83)
	at kotlinx.coroutines.experimental.scheduling.LimitingDispatcher.dispatch(ExperimentalCoroutineDispatcher.kt:123)
	at kotlinx.coroutines.experimental.scheduling.LimitingDispatcher.dispatch(ExperimentalCoroutineDispatcher.kt:113)
	at kotlinx.coroutines.experimental.DispatchedKt.resumeCancellable(Dispatched.kt:189)
	at kotlinx.coroutines.experimental.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:16)
	at kotlinx.coroutines.experimental.CoroutineStart.invoke(CoroutineStart.kt:83)
	at kotlinx.coroutines.experimental.BuildersKt__Builders_commonKt.withContext(Builders.common.kt:136)
	at kotlinx.coroutines.experimental.BuildersKt.withContext(Unknown Source)
	at kotlinx.coroutines.experimental.BuildersKt__Builders_commonKt.withContext$default(Builders.common.kt:107)
	at kotlinx.coroutines.experimental.BuildersKt.withContext$default(Unknown Source)
...
	at kotlin.coroutines.experimental.jvm.internal.CoroutineImpl.resume(CoroutineImpl.kt:42)
	at kotlinx.coroutines.experimental.DispatchedTask$DefaultImpls.run(Dispatched.kt:149)
	at kotlinx.coroutines.experimental.DispatchedContinuation.run(Dispatched.kt:13)
	at io.vertx.kotlin.coroutines.VertxCoroutineExecutor$execute$1.handle(VertxCoroutine.kt:408)
	at io.vertx.kotlin.coroutines.VertxCoroutineExecutor$execute$1.handle(VertxCoroutine.kt:404)
	at io.vertx.core.impl.ContextImpl.lambda$wrapTask$2(ContextImpl.java:339)
	at io.vertx.core.impl.ContextImpl$$Lambda$33/1483526283.run(Unknown Source)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:745)

Where the ... is 3 lines of our code.

I would love to understand things under the hood but currently we will just use the non-blocking Redis client to avoid the problem first.

@qwwdfsad
Copy link
Member

qwwdfsad commented Aug 31, 2018

Could you please provide a small reproducer? Can it be a bug in BlockedThreadChecker?
Because the stacktrace looks way too suspicious, it reports that your thread is blocked in parkedWorkersStackPop, but this method (and the whole trace dispatch -> request cpu worker) is non-blocking

@adamp
Copy link

adamp commented Sep 5, 2018

I've seen similar over the last few days of giving the IO dispatcher a try. The associated pool spins all workers which consume all available CPU resources looking for work alongside the thread invoking withContext(IO) spinning in a tight loop looking for a pool thread to give the work to. I haven't been able to isolate it and it takes several minutes to reproduce in the best case. All associated issues disappear upon replacing the IO dispatcher with a ThreadPoolExecutor(...).asCoroutineDispatcher().

Looks like the same trace as above though, from the thread calling withContext(IO):

"kotlinx.coroutines.DefaultExecutor@2897" daemon prio=5 tid=0x1f nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.parkedWorkersStackPop(AtomicFU.common.kt:391)
	  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.tryUnpark(CoroutineScheduler.kt:395)
	  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.requestCpuWorker(CoroutineScheduler.kt:357)
	  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.dispatch(CoroutineScheduler.kt:345)
	  at kotlinx.coroutines.experimental.scheduling.ExperimentalCoroutineDispatcher.dispatchWithContext$kotlinx_coroutines_core(ExperimentalCoroutineDispatcher.kt:83)
	  at kotlinx.coroutines.experimental.scheduling.LimitingDispatcher.dispatch(ExperimentalCoroutineDispatcher.kt:123)
	  at kotlinx.coroutines.experimental.scheduling.LimitingDispatcher.dispatch(ExperimentalCoroutineDispatcher.kt:113)
	  at kotlinx.coroutines.experimental.DispatchedKt.resumeCancellable(Dispatched.kt:56)
	  at kotlinx.coroutines.experimental.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:16)
	  at kotlinx.coroutines.experimental.CoroutineStart.invoke(CoroutineStart.kt:83)
	  at kotlinx.coroutines.experimental.BuildersKt__Builders_commonKt.withContext(Builders.common.kt:136)
	  at kotlinx.coroutines.experimental.BuildersKt.withContext(Unknown Source:1)
	  at kotlinx.coroutines.experimental.BuildersKt__Builders_commonKt.withContext$default(Builders.common.kt:107)
	  at kotlinx.coroutines.experimental.BuildersKt.withContext$default(Unknown Source:1)

Edit: not using either Vert.x or similar blocking thread monitoring, simply waiting until top shows the VM process spinning out of control and grabbing a thread snapshot manually with the debugger.

@objcode
Copy link
Contributor

objcode commented Sep 8, 2018

We're starting to put together coroutines outreach for Android. We're planning on recommending the IO pool - but want to make sure this issue is resolved before we recommend it widely. I'll follow this issue, if the work migrates to another issue can you paste it here?

@elizarov
Copy link
Contributor

elizarov commented Sep 9, 2018

@objcode We are cleaning up and finalizing API for 1.0 release of this library, so I'd suggest to also watch for issues with "for 1.0 release" label: https://github.com/Kotlin/kotlinx.coroutines/issues?q=is%3Aissue+is%3Aopen+label%3A%22for+1.0+release%22

@SUPERCILEX
Copy link
Contributor

@elizarov @qwwdfsad I just ran into this too, but can't get a consistent repro. This should be a stop-the-world issue since it occurs randomly and completely halts the main thread.

I managed to capture a few extra details to help you debug:

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 flags=0 obj=0x740df568 self=0xa57da000
  | sysTid=12271 nice=-10 cgrp=default sched=0/0 handle=0xa9e76518
  | state=R schedstat=( 30274960494 2765149752 11290 ) utm=2802 stm=227 core=3 HZ=100
  | stack=0xbf4b4000-0xbf4b6000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler$Worker.getNextParkedWorker(CoroutineScheduler.kt:624)
  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.parkedWorkersStackNextIndex(CoroutineScheduler.kt:201)
  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.parkedWorkersStackPop(CoroutineScheduler.kt:173)
  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.tryUnpark(CoroutineScheduler.kt:395)
  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.requestCpuWorker(CoroutineScheduler.kt:366)
  at kotlinx.coroutines.experimental.scheduling.CoroutineScheduler.dispatch(CoroutineScheduler.kt:345)
  at kotlinx.coroutines.experimental.scheduling.ExperimentalCoroutineDispatcher.dispatchWithContext$kotlinx_coroutines_core(ExperimentalCoroutineDispatcher.kt:83)
  at kotlinx.coroutines.experimental.scheduling.LimitingDispatcher.dispatch(ExperimentalCoroutineDispatcher.kt:123)
  at kotlinx.coroutines.experimental.scheduling.LimitingDispatcher.dispatch(ExperimentalCoroutineDispatcher.kt:113)
  at kotlinx.coroutines.experimental.DispatchedKt.resumeCancellable(Dispatched.kt:189)
  at kotlinx.coroutines.experimental.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:23)
  at kotlinx.coroutines.experimental.CoroutineStart.invoke(CoroutineStart.kt:99)
  at kotlinx.coroutines.experimental.AbstractCoroutine.start(AbstractCoroutine.kt:153)
  at kotlinx.coroutines.experimental.DeferredKt.async(Deferred.kt:163)
  at kotlinx.coroutines.experimental.DeferredKt.async$default(Deferred.kt:155)
  at com.supercilex.robotscouter.core.data.DatabaseKt$teamUpdater$1.onChildChanged(Database.kt:123)
  at com.supercilex.robotscouter.core.data.DatabaseKt$teamUpdater$1.onChildChanged(Database.kt:113)
  at com.firebase.ui.common.BaseObservableSnapshotArray.notifyOnChildChanged(BaseObservableSnapshotArray.java:172)
  at com.supercilex.robotscouter.core.data.LifecycleAwareFirestoreArray.access$notifyOnChildChanged(Database.kt:308)
  at com.supercilex.robotscouter.core.data.LifecycleAwareFirestoreArray$eventForwarder$1.onChildChanged(Database.kt:379)
  at com.supercilex.robotscouter.core.data.LifecycleAwareFirestoreArray$eventForwarder$1.onChildChanged(Database.kt:373)
  at com.firebase.ui.common.BaseObservableSnapshotArray.notifyOnChildChanged(BaseObservableSnapshotArray.java:172)
  at com.firebase.ui.firestore.FirestoreArray.onDocumentAdded(FirestoreArray.java:101)
  at com.firebase.ui.firestore.FirestoreArray.onEvent(FirestoreArray.java:84)
  at com.firebase.ui.firestore.FirestoreArray.onEvent(FirestoreArray.java:22)
  at com.google.firebase.firestore.zzi.onEvent(com.google.firebase:firebase-firestore@@17.0.5:1834)
  at com.google.firebase.firestore.g.zzh.zza(com.google.firebase:firebase-firestore@@17.0.5:28)
  at com.google.firebase.firestore.g.zzi.run(com.google.firebase:firebase-firestore@@17.0.5:-1)
  at android.os.Handler.handleCallback(Handler.java:789)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:164)
  at android.app.ActivityThread.main(ActivityThread.java:6541)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:240)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:767)

@LouisCAD
Copy link
Contributor

Shouldn't this issue have its labels updated? (like having the bug label in place of the current one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants