define what “heavy computation” means for the spec #54

rkuhn · 2014-05-19T11:57:24Z

The consensus on updating the asynchronicity semantics of the Subscriber–Subscription relationship established in #46 involves the term “heavy computation” as describing some action that SHOULD NOT (as per RFC2119) be performed within callbacks (like onNext and request). We need to properly define what we mean by this classification and then incorporate that (and the reasoning behind) in the specification.

The text was updated successfully, but these errors were encountered:

rkuhn · 2014-05-19T12:41:17Z

@mariusaeriksen I also like your definition using preemption, but we should acknowledge that “heavy” is a subjective measure and different use-cases may have vastly different requirements. 1µs might be “heavy” for HFT applications, and batch processing might tolerate anything that does not block indefinitely.

BTW: non-termination falls under “blocking”, which means that that is not permitted in any case.

@alexandru Your argument about staying on the same thread is a valid one, but one second is way more than is required on this front. In any case, we will have to leave the precise interpretation to concrete implementations.

@jrudolph You question of running use-supplied code in a callback can go either way: one implementation may choose to be conservative and always dispatch asynchronously while another could place the burden of being “non-heavy” on the user—who is the one defining the precise meaning of this term in any case.

It should be noted that user code cannot make the framework faster, so what we are demanding is that absent good reasons (as meant by the SHOULD NOT classification) an implementation must not exhaust the “heaviness” budget by itself.

jrudolph · 2014-05-19T13:03:57Z

who is the one defining the precise meaning of this term in any case.

As this is about the interface between different implementations the problem is that different components interacting over this interface may not be controlled by just one user. I don't see yet how the developer of just one part of a processing pipeline should make her decision about how to interact with another abstract component that may be plugged together with it.

E.g. implementation A may choose that user transformations are generally done synchronously in onNext. In that case, another defensive publisher B would want to choose to schedule the actual callback call itself to make sure its own processing isn't influenced by that "heavy" user transformation but wouldn't want to do that generally because it would be wasteful to do so in general. Now, who's responsible for configuring that publisher accordingly (that already assumes that components usually can be configured like this)? Also, wiring code itself may be abstracted so that the wiring code may not know enough to make the decision.

rkuhn · 2014-05-19T13:15:04Z

I would imagine that the implementation of the Subscriber would either be explicitly documented in how much is done on the calling thread (this would be the case if provided by a library), or that the implementors of Publisher and Subscriber communicate or are bound by a common interface definition (document). Even implementations that normally prefer to run things synchronously (like RxJava) will have to provide explicit asynchronous boundaries, and this facility will then be prepended to the Subscriber when needed.

In essence this means that the performance characteristics of a Subscriber are part of its concrete implementation contract, not a property of the generic Reactive Streams interface. The previous discussions have convinced me that this split is desirable for a widely applicable standard.

jrudolph · 2014-05-19T13:52:51Z

In essence this means that the performance characteristics of a Subscriber are part of its concrete implementation contract, not a property of the generic Reactive Streams interface.

So, in other words, this means that the interface we define here is more like the necessary, minimal infrastructure required to connect two different implementations. However, the interface may not be sufficient to actually make a combination of parts from several implementations run well or fast. A user would need additional information about the implementations to optimize such an interconnection.

I think I can accept that approach because interconnections between streams from different libraries may not be the most common combinations in an application and also not necessarily the most important for optimization. Not specifying the details may just mean not optimizing prematurely where it would be hard to foresee the actual applications.

jbrisbin · 2014-05-19T14:40:40Z

IMO it only matters how much work is being done if there is more work to perform than there are resources to perform it concurrently. It might be beneficial to the definition to include some element of "don't do so much work in a Subscriber that it would cause a backlog due to concurrent tasks already using all allocated resources". This feeds back to what @alexandru and @mariusaeriksen were saying on #46.

It's usually desirable in an asynchronous system to use available resources to their fullest. So even a "heavy" task could be desirable if there are resources available to perform it immediately. It seems like what we need to specify in a generic way is a category of workloads that trigger certain conditions in the system that Reactive Streams can't allow to happen: namely holding up work waiting on a state change (releasing a lock, waiting for new IO, etc...).

tmontgomery · 2014-05-19T15:49:32Z

@jbrisbin I think you might have nailed it there. It really is about "holding up work [potentially indefinitely]". The "waiting on a state change" part is key, but potentially too specific. The subjective measure of how long is too long might be noise at the end of the day. @rkuhn points out rightly that this is runtime and system use case dependent. In some environments, microseconds are extremely meaningful. Also, queuing is (subjectively) OK when its effects are short lived and bounded and fit the use case from a responsiveness point of view.

At the end of the day, I think the wording might be stronger if we acknowledge:

The system SHOULD be responsive/reactive to the level of granularity associated to the intended use case (including runtime), and
The growth of queue utilization because of service delay MUST be temporary.

The last one above is a nod to queuing theory, in essence. I'm not sure I phrased it well (or even correctly), but what I intend is to suggest that all backlog (or buffering if you prefer) must be temporary and not pathological and persistent. i.e. the system must recover by itself from spikes in ingress rate. This is needed anyway for a stable system.

The result of heavy workload is always buffering (or backlog). Which is fine when temporary and actually can't be avoided anyway. It's how the system is designed to cope with it at that level that matters. Some may go for more capacity. And some may go for bounded processing. Both are viable implementations for given use cases. The wording just needs to support them.

benjchristensen · 2014-05-19T16:32:29Z

the interface may not be sufficient to actually make a combination of parts from several implementations run well or fast

@jrudolph Even if each operator is required to be async I don't see how that contract enables me to combine operators and trust my system will perform well. Asynchrony does not ensure this for all the reasons brought up in #46, the simplest example being that a operator can be async yet still saturate all available CPUs on a machine. This is one of the reasons for me why making it a requirement does not buy anything and is not worth the tradeoffs of prohibiting synchronous execution. Predictable performance requires understanding all components being wired together regardless of whether they are asynchronous or synchronous.

benjchristensen · 2014-05-19T16:50:11Z

The subjective measure of how long is too long might be noise at the end of the day. @rkuhn points out rightly that this is runtime and system use case dependent. In some environments, microseconds are extremely meaningful. Also, queuing is (subjectively) OK when its effects are short lived and bounded and fit the use case from a responsiveness point of view.

I agree with this summary from @tmontgomery and it leads me to think we should eliminate the "heavy computation" distinction.

Another point to consider on the definition of "heavy" is that one of the major reasons for backpressure is exactly to handle this scenario. If a user decides to do work that is heavy enough to cause the consumer to be slower than the producer, then one of two things will generally happen (ignoring the option to buffer which is what is being avoided):

On a "cold" source it will be consumed slower. For example, lines from a file will be read and emitted as slowly as needed by the slow consumer.
On a "hot" source events will be dropped. For example, a stock price stream will drop events until the consumer is ready for the next one.

In both cases, the consumer can be as "heavy" as it wants to be and the Subscription.request(n) will handle the backpressure. That's kind of the point.

Thus, perhaps the issue over heaviness is more a question for the Publisher as to whether it is doing unicast or multicast, as per the discussion in #19?

If it is a "cold" source, such as a file, then it is unicast and the backpressure via request(n) is sufficient to handle the slow consumer, even if each event takes minutes to process. It is as always up to the user to scale their environment resources correctly, but the Reactive Stream contract will do what it should and not cause any other Subscriber to be blocked because each Subscription starts from scratch and gets its own resources.

If it is a "hot" source such as a stock price stream, then the Publisher is likely multicasting, and it is probably preferred that it treat each Subscription asynchronously. In other words, the Publisher is free to use separate threads (actors, event loops, fibers, whatever) for each Subscription so that a fast consumer and slow consumer can both consume from a single underlying stream. This decoupling is up to the Publisher to do and each Subscriber can be oblivious of how its processing is impacting other Subscribers of the same Publisher. In fact, the Publisher itself should probably behave as if it is unicast to each Subscriber but then share an underlying source of data (ie. network connection).

Considering this, perhaps we change from this:

A Subscriber MUST NOT block a Publisher thread.
A Subscriber MAY behave synchronously or asynchronously but SHOULD NOT synchronously perform heavy computations in its methods (onNext, onError, onComplete, onSubscribe).

to this:

A Subscriber MUST NOT block a Publisher thread.
A Subscriber MAY behave synchronously or asynchronously.

and possibly add something like this line for Publisher:

If a Publisher is multicasting (supporting multiple Subscriber/Subscription pairs on a single stream) it is RECOMMENDED to asynchronously dispatch events to each Subscriber.

In other words, if a Publisher doesn't want a single slow Subscriber to impact other Subscribers, then each Subscription should be on its own resource (such as a thread). If a Publisher does this, then the request(n) backpressure mechanism appears sufficient and we can eliminate the restrictive, subjective and vague use of "heavy computation".

tmontgomery · 2014-05-19T21:08:29Z

I think that wording could work. It's open enough to allow for alternatives, but clear states intent.

viktorklang · 2014-05-26T09:21:24Z

Have we reached consensus here?

viktorklang · 2014-05-31T13:09:40Z

I'd want to invert this:

"If a Publisher is multicasting (supporting multiple Subscriber/Subscription pairs on a single stream) it is RECOMMENDED to asynchronously dispatch events to each Subscriber."

so instead of relying on distrust (a Publisher does not know if a Subscriber will be "heavy" and as such he must distrust all Subscribers if he/she does not wish to be held hostage by them)

I'd want to have the opposite, a relationship built on trust:

If a Subscriber suspects that its processing of events will negatively impact its Publisher's responsivity, it is RECOMMENDED that it asynchronously dispatches its events.

It's not about multicast vs unicast, it's about the Publisher getting enough CPU time to create/obtain the elements it will publish.

smaldini · 2014-06-03T21:23:10Z

Time to cast a vote, I can include @viktorklang wording in the current #61

smaldini · 2014-06-04T21:16:23Z

Merged in #61. Please re-open the ticket or comment directly if its a simple wording issue but the principle is all clear.

rkuhn mentioned this issue May 19, 2014

Clarification of “Asynchronous Boundary” #46

Closed

jbrisbin mentioned this issue May 21, 2014

Release version 0.4.0 #56

Closed

viktorklang mentioned this issue Jun 4, 2014

Prepare for 0.4.0.M1 #61

Merged

smaldini closed this as completed Jun 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

define what “heavy computation” means for the spec #54

define what “heavy computation” means for the spec #54

rkuhn commented May 19, 2014

rkuhn commented May 19, 2014

jrudolph commented May 19, 2014

rkuhn commented May 19, 2014

jrudolph commented May 19, 2014

jbrisbin commented May 19, 2014

tmontgomery commented May 19, 2014

benjchristensen commented May 19, 2014

benjchristensen commented May 19, 2014

tmontgomery commented May 19, 2014

viktorklang commented May 26, 2014

viktorklang commented May 31, 2014

smaldini commented Jun 3, 2014

smaldini commented Jun 4, 2014

define what “heavy computation” means for the spec #54

define what “heavy computation” means for the spec #54

Comments

rkuhn commented May 19, 2014

rkuhn commented May 19, 2014

jrudolph commented May 19, 2014

rkuhn commented May 19, 2014

jrudolph commented May 19, 2014

jbrisbin commented May 19, 2014

tmontgomery commented May 19, 2014

benjchristensen commented May 19, 2014

benjchristensen commented May 19, 2014

tmontgomery commented May 19, 2014

viktorklang commented May 26, 2014

viktorklang commented May 31, 2014

smaldini commented Jun 3, 2014

smaldini commented Jun 4, 2014