-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Dispatchers.Default consumes too much CPU in the face of short bursts of work #840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We got caught out by this, made for some serious head scratching and trial an error refactoring until I realized I could reproduce this with a single coroutine and it was the scheduler - surprising that the scheduler was pushed out as the default with such poor behavior at anything except CPU bound or high coroutine throughput workloads. This also appears worsened by the scheduler being too aggressive when waking up threads in the first place: For example this:
Results in almost 300% CPU utilization on my 4-core MacBook Pro: The conservative parking added to the aggressive scheduling makes this a problem even under incredibly low load. When sleeping for: 5ms - ~200% The problem is much worse on machines with large numbers of threads. I use an 10-core iMac Pro, and the CPU utilization scales with the number of threads allocated. Our workload also mixed in |
Facing this also in a basic android app, a dozen of threads on a snapdragon 835 taking lot's of CPU at start until the 1,5 sec park starts. Using basic stuff with actor / a few channels & worker pools. I'm not sure to understand the correlation between the yield / spinning issue and the number of threads? |
The number of threads is a factor because at low-loads it appears there's small enough amount of work that threads will spend their maximum time in That makes the idle problem scale up with the number of threads. For instance, here's that same example with 1300% CPU utilization on my 10-core, 20-thread iMac Pro: |
The linked discussion mentions setting |
My question was more about the fact that a 4 core processor should not start 12 threads as I thought it would be limited to 4 per the doc, so was wondering is that high number of thread was related to that or something else with maybe a simpler fix :) |
…e, introduce system properties to tune this behaviour. Rationale: Thread.yield has a significant CPU cost (especially relatively to spins) and provides no significant benefits compared with exponential parking. Thus yields burn a lot of CPU due to JVM upcall + syscall, providing benefits neither for liveness property (as CPU is mostly busy with doing these calls) nor for latencies. Initial benchmarking shows significant CPU usage reduction (200-300%) in low-to-average load benchmarks with no degradation on target affinity benchmarks. Partially addresses #840
…e, introduce system properties to tune this behaviour. Rationale: Thread.yield has a significant CPU cost (especially relatively to spins) and provides no significant benefits compared with exponential parking. Thus yields burn a lot of CPU due to JVM upcall + syscall, providing benefits neither for liveness property (as CPU is mostly busy with doing these calls) nor for latencies. Initial benchmarking shows significant CPU usage reduction (200-300%) in low-to-average load benchmarks with no degradation on target affinity benchmarks. Partially addresses #840
It is available a spin-wait hints for Java 9+ |
Hi. In my current setup with coroutines 1.3.3 I've struggle with the same CPU consuming issue. |
@rumatoest Do you have a reproducer or other details that would help to pinpoint the issue? |
See https://discuss.kotlinlang.org/t/default-dispatcher-creates-too-many-threads/10143
CoroutineScheduler
may consume too much CPU when there is no work to do (a lot of spins, yields and short parks before the transition to the idle state). What even worse, it can burn a lot of CPU in the pattern "a short burst of work, small idle period" (I remember @e5l has noticed that as well).We should increase the speed of spinning -> idle transition, ideally without losing any performance for task-heavy benchmarks.
We probably can safely remove
yield
usages (this is the most suspicious place) and try to gracefully detect quiescent periodThe text was updated successfully, but these errors were encountered: