-
Notifications
You must be signed in to change notification settings - Fork 1.9k
CoroutineScheduler rework #1652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kotlinx-coroutines-core/jvm/test/scheduling/BlockingCoroutineDispatcherThreadLimitStressTest.kt
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
dd7ab91
to
dd30af0
Compare
Does not seem that we need kotlinx.coroutines/kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt Line 577 in dd30af0
It is only read in other threads after Thread.join and in toString()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. All the feedback I can give is mostly cosmetic.
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
Note, a bug was introduced somewhere in shutdown sequence. It can hang on shutdown now: https://teamcity.jetbrains.com/buildConfiguration/KotlinTools_KotlinxCoroutines_NightlyStressWindows/2652563
|
I was avoiding adding long-scanning (dd30af0) because it could have masked all the bugs on the rendezvous of parking/unparking. Ironically, it has exposed existing problems :) It seems like it is impossible to ensure strict liveness in loads like
|
@qwwdfsad
Hm. If the thread is executing CPU task right now, then it should not be unparked by Combining both |
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks way better now.
…CoroutineScheduler
258227b
to
203580f
Compare
* WorkQueue.trySteal reports not only whether the steal was successful, but also a waiting time unless task becomes stealable * CoroutineScheduler.trySteal attempts to steal from all the workers (starting from the random position) per iteration to have deterministic stealing * Parking mechanics rework. After unsuccessful findTask, worker immediately adds itself to parking stack, then rescans all the queues to avoid missing tryUnparks and only then parks itself (parking duration depends on WorkQueue.trySteal result), terminating later * Excessive spinning and parking is completely eliminated, significantly (x3) reducing CPU-consumption and making CoroutineScheduler on-par with FJP and FTP on Ktor-like workloads * Downside of aggressive parking is a cost of slow-path unpark payed by external submitters that can be shown in degraded DispatchersContextSwitchBenchmark. Follow-up commits will fix that problem * Retry on tryStealLastScheduled failures to avoid potential starvation * Merge available CPU permits with controlState to simplify reasoning about pool state and make all state transitions atomic * Get rid of synthetic accessors
* Work stealing: get rid of global queue for offloading during stealing because it never happens in fact * Guard all critical invariants related to work-stealing with asserts * New work signalling strategy that guarantees complete liveness in the face of "accidentally-blocking" CPU tasks * Advanced double-phase unparking mechanism that mitigates the most expensive part of signalling an additional work * Current limitation: blocking tasks are not yet properly signalled
203580f
to
cab8deb
Compare
Invariants: * Steal only one task per attempt to avoid missing steals that potentially may block the progress (check-park-check may miss tasks that are being stolen) * New WorkQueue.add invariant: bufferSize < capacity => add is always successful * Re-visited tests that expose a lot of problems * Ability to steal from the middle of work queue in order to steal blocking tasks with ABA prevention Changes: * Instead of "blocking workers" use "blocking tasks" state that is incremented on each blocking submit and decrement only when task is completed * On each work signalling try to compensate blocking tasks by enforcinf invariant "created threads == blocking tasks + up to core size workers" * Now if worker was not spuriously woken up, it has a task dedicated for him that should be found. For that reason attempt to steal blocking tasks (that may be in the middle of the work queue). Additionally, instead of scanning the whole global queue, just split it in two (one for blocking, one for regular tasks) * Get rid of conditional remove from the global queue * Avoid excessive unparks for threads that are not yet able to steal the task due to workstealing resolution: do not add such workers to the stack
cab8deb
to
d6e7470
Compare
* Do not push worker to the stack during second pass of "min duration scanning" * Locally cache whether local queue has any work to execute to save atomic getAndSet and a bunch of atomic loads * More precise rendezvous for parking * Long-scanning stealing to emulate spinning * Properly handle interference of termination sequence and protection against spurious wakeups * Documentation improvements, naming, proper spurious wakeup check
d6e7470
to
cff57ef
Compare
cff57ef
to
7b614a7
Compare
…et rid of long-scanning sequence
7b614a7
to
4224e01
Compare
Fixes #840
Fixes #1046
Fixes #1286