Skip to content

Dispatchers.Default consumes too much CPU in the face of short bursts of work #840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
qwwdfsad opened this issue Nov 16, 2018 · 8 comments
Closed

Comments

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Nov 16, 2018

See https://discuss.kotlinlang.org/t/default-dispatcher-creates-too-many-threads/10143

CoroutineScheduler may consume too much CPU when there is no work to do (a lot of spins, yields and short parks before the transition to the idle state). What even worse, it can burn a lot of CPU in the pattern "a short burst of work, small idle period" (I remember @e5l has noticed that as well).

We should increase the speed of spinning -> idle transition, ideally without losing any performance for task-heavy benchmarks.

We probably can safely remove yield usages (this is the most suspicious place) and try to gracefully detect quiescent period

@DanielThomas
Copy link

DanielThomas commented Dec 9, 2018

We got caught out by this, made for some serious head scratching and trial an error refactoring until I realized I could reproduce this with a single coroutine and it was the scheduler - surprising that the scheduler was pushed out as the default with such poor behavior at anything except CPU bound or high coroutine throughput workloads.

This also appears worsened by the scheduler being too aggressive when waking up threads in the first place:

For example this:

import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.channels.produce
import kotlinx.coroutines.launch
import kotlinx.coroutines.runBlocking
import java.util.concurrent.atomic.AtomicInteger

fun main() {
    runBlocking {
        val count = AtomicInteger()
        val ints = produce<Int> {
            launch(Dispatchers.Default) {
                while (true) {
                    channel.send(count.incrementAndGet())
                }
            }
        }
        for (int in ints) {
            Thread.sleep(2)
        }
    }
}

Results in almost 300% CPU utilization on my 4-core MacBook Pro:

screen shot 2018-12-09 at 1 08 05 pm

The conservative parking added to the aggressive scheduling makes this a problem even under incredibly low load. When sleeping for:

5ms - ~200%
10ms - ~120%
50ms - ~34%

The problem is much worse on machines with large numbers of threads. I use an 10-core iMac Pro, and the CPU utilization scales with the number of threads allocated.

Our workload also mixed in Dispatchers.IO so I had 28 threads on a 10-core/20-thread machine consuming CPU at this level.

@Tolriq
Copy link

Tolriq commented Dec 10, 2018

Facing this also in a basic android app, a dozen of threads on a snapdragon 835 taking lot's of CPU at start until the 1,5 sec park starts.

Using basic stuff with actor / a few channels & worker pools.

I'm not sure to understand the correlation between the yield / spinning issue and the number of threads?

@DanielThomas
Copy link

DanielThomas commented Dec 10, 2018

The number of threads is a factor because at low-loads it appears there's small enough amount of work that threads will spend their maximum time in cpuWorkerIdle because they don't receive any work in that time, but enough work that all of the threads in the core pool get woken to do work. I'm unclear why it's as severe as it is however - this example would seem to only require that coroutine to execute 500 times per second, unclear why those threads are spending so much time awake waiting for work.

That makes the idle problem scale up with the number of threads. For instance, here's that same example with 1300% CPU utilization on my 10-core, 20-thread iMac Pro:

screen shot 2018-12-10 at 12 00 22 pm

@DanielThomas
Copy link

The linked discussion mentions setting kotlinx.coroutines.scheduler=off to move Dispatchers.Default back to CommonPool, but that doesn't help with usages of Dispatchers.IO.

@Tolriq
Copy link

Tolriq commented Dec 10, 2018

My question was more about the fact that a 4 core processor should not start 12 threads as I thought it would be limited to 4 per the doc, so was wondering is that high number of thread was related to that or something else with maybe a simpler fix :)

qwwdfsad added a commit that referenced this issue Dec 12, 2018
…e, introduce system properties to tune this behaviour.

Rationale:

Thread.yield has a significant CPU cost (especially relatively to spins) and provides no significant benefits compared with exponential parking.
Thus yields burn a lot of CPU due to JVM upcall + syscall, providing benefits neither for liveness property (as CPU is mostly busy with doing these calls) nor for latencies. Initial benchmarking shows significant CPU usage reduction (200-300%) in low-to-average load benchmarks with no degradation on target affinity benchmarks.

Partially addresses #840
qwwdfsad added a commit that referenced this issue Dec 12, 2018
…e, introduce system properties to tune this behaviour.

Rationale:

Thread.yield has a significant CPU cost (especially relatively to spins) and provides no significant benefits compared with exponential parking.
Thus yields burn a lot of CPU due to JVM upcall + syscall, providing benefits neither for liveness property (as CPU is mostly busy with doing these calls) nor for latencies. Initial benchmarking shows significant CPU usage reduction (200-300%) in low-to-average load benchmarks with no degradation on target affinity benchmarks.

Partially addresses #840
@fvasco
Copy link
Contributor

fvasco commented Jun 25, 2019

It is available a spin-wait hints for Java 9+

http://openjdk.java.net/jeps/285

@rumatoest
Copy link

Hi. In my current setup with coroutines 1.3.3 I've struggle with the same CPU consuming issue.
Setting system property kotlinx.coroutines.scheduler to off fixed CPU issue.

@elizarov
Copy link
Contributor

Hi. In my current setup with coroutines 1.3.3 I've struggle with the same CPU consuming issue.
Setting system property kotlinx.coroutines.scheduler to off fixed CPU issue.

@rumatoest Do you have a reproducer or other details that would help to pinpoint the issue?

anti-social added a commit to anti-social/prometheus-kt that referenced this issue Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants