Skip to content

Define semantics of subscribers erroring in onNext(). #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ldaley opened this issue Sep 2, 2014 · 46 comments
Closed

Define semantics of subscribers erroring in onNext(). #106

ldaley opened this issue Sep 2, 2014 · 46 comments

Comments

@ldaley
Copy link
Contributor

ldaley commented Sep 2, 2014

I can't find this defined in the spec. Apologies if I've just missed it.

Equivalent in Rx semantics would be the Publisher must catch anything thrown by onNext() and terminate with an onError(). For processors, the upstream subscription would also have to be cancelled.

@rkuhn
Copy link
Member

rkuhn commented Sep 2, 2014

There was a lengthy discussion of this which led to rule 2.13, my understanding was that onNext should be treated the same way but that seems to have been lost in the editing (or I my memory is wrong). Unfortunately I cannot find the exact comment anymore, @viktorklang @benjchristensen ? (this is why I’d prefer all issues to be as targeted as this one ;-) )

viktorklang added a commit that referenced this issue Sep 2, 2014
@viktorklang
Copy link
Contributor

I've addressed this in: #110

@smaldini
Copy link
Contributor

IllegalStateException for onNext() ? I'm not a fan for wrapping up the exception in this specific case, it should just call onError with the source exception innit ?

@viktorklang
Copy link
Contributor

@smaldini I think it is important to signal that it is not acceptable behavior, if we just pass the exception back nobody will ever know that it's not well-behaved spec-wise. Thoughts?

@benjchristensen
Copy link
Contributor

I agree it should just pass the onError and not do any further wrapping. User errors can happen at any time in any user-provided function. Forcing them to all do try/catch within their functions is highly unwanted, and that's what this rule would effectively force to be "compliant".

@benjchristensen
Copy link
Contributor

For history, related discussions exist in #52 (comment) and #68

Particularly:

If an onNext throws an Exception it is breaking the contract in Rx, but it can happen. Users can write buggy code or JVM-level errors can occur. Thus, an Rx implementation must account for "out of contract" behavior.

My stance on this discussion depends on the nuance of what is being restricted. If it is an operator implementation, then I'm okay with saying it shouldn't throw (except for the non-recoverable errors as discussed elsewhere). If user provided functions are trying to be prevented from throwing, then I disagree.

Consider this:

stream.map(t -> {
  ... do stuff ...
  return someComputation();
})

That user provided function should not need to be written like this:

stream.map(t -> {
try {
  ... do stuff ...
  return someComputation();
}catch(Exception e) {
  // now what do I do?
  return what???;
}
})

In particular, there is no mechanism for returning an Exception as there is in flatMap so the only legit approach is to throw an Exception which can then be propagated via onError by the map operator.

stream.map(t -> {
  ... do stuff that might fail ...
  return someComputation();
}).handleError(t -> {
  log(t);
  return defaultValue;
})

I also don't think we should force wrapping into an IllegalStateException even if an Operator implementation fails. If it fails it is a bug, the stacktrace will show it and it will be fixed. Forcing every single Operator to catch and wrap anything that fails underneath it is unnecessary.

@rkuhn
Copy link
Member

rkuhn commented Sep 15, 2014

@benjchristensen I think we’re all in full agreement here: it is the obligation of the map operator to wrap the user code in such a fashion that errors are propagated downstream (via onError) instead of upstream to the caller of onNext—which most of the time should be a non-issue as we should be dispatching asynchronously. The question raised here was not related to that user function, it was only related to the direct implementation of the code calling onNext, and that one should flag exceptions emanating from this method as illegal states, unless you know of an exception type that better captures this kind of spec violating behavior.

@viktorklang
Copy link
Contributor

@rkuhn yes, this is exactly what I meant. Thanks for elaborating!

@benjchristensen
Copy link
Contributor

Glad we agree on the user function failure scenario and its impact on an onNext.

it was only related to the direct implementation of the code calling onNext, and that one should flag exceptions emanating from this method as illegal states

Since operators/processors are generally going to be composed, wouldn't compliance with this rule mean that every processor would need to have a try/catch that wraps in as an IllegalStateException? Also due to composition, the code that catches the exception would not be aware of what caused it.

So I don't feel like I'm understanding what you want code to do.

Take this example:

@Override
public void onNext(T t) {
    try {
        o.onNext(userFunction.call(t));
    } catch (Throwable e) {
        onError(e);
    }
}

Here the try/catch is intended to capture problems from the userFunction, which we have agreed upon. But it can also incidentally catch errors from the onNext failing, but it won't wrap them in anything or distinguish between where it happened.

Are you saying you want all code that calls onNext to do something like this?

@Override
public void onNext(T t) {
    R r = null;
    try {
        r = transformer.call(t);
    } catch (Throwable e) {
        onError(e);
    }
    try {
        o.onNext(r);
    } catch(Throwable e) {
        onError(new IllegalStateException("Rule 2.13", e));
    }
}

@ldaley
Copy link
Contributor Author

ldaley commented Sep 18, 2014

@benjchristensen my interpretation is that your latter example is the intention, and is most desirable in my opinion (with the additional provisor that the upstream subscription should also be cancelled).

@rkuhn
Copy link
Member

rkuhn commented Sep 18, 2014

Yes, the last sample is what I meant, and it will predominantly library code that implements RS in this fashion (though mostly not directly from onNext itself). There is no danger of double-wrapping since the error does not escalate up the call stack but forward in the Subscriber chain. One thing to consider is that an exception emanating from onError is covered by 2:14 which prescribes a mechanism “appropriate for the runtime environment”; if you choose to re-throw then wrapping is probably a bad idea but you are free to do what you deem appropriate.

@benjchristensen
Copy link
Contributor

I don't know that I'm a fan of requiring every single processor/operator having to treat onNext as if it can blow up and then put in specific error messages.

If onNext blows up it is breaking the contract and will throw. If the processor/operator is synchronous (like map often is) then it will throw up to whoever is above it.

An async processor/operator will generally want to be overly cautious and wrap everything, but in the dozens of operators implementations of RxJava we only need a handful to try/catch around onNext - the async operators and some of the plumbing (such as lift and subscribe).

In other words, it feels like too strong a statement to require every single invocation of onNext to be wrapped in a try/catch. The point of the spec is to say that onNext doesn't throw, and therefore people shouldn't have to try/catch every time they use it.

@jbrisbin
Copy link

If we wanted to "force" implementors to handle an Exception emanating from onNext then the method should have been declared with throws Exception. As such, it seems unwise (and sometimes downright impractical) to impose try/catch around every onNext invocation.

@benjchristensen
Copy link
Contributor

Agreed. And I really don't like checked exceptions so would prefer we just don't force this.

If a processor/operator is doing something that could result in an exception being thrown, code defensively and deal with it. Others shouldn't all have to be defensive when the contract says onNext shouldn't throw.

@viktorklang
Copy link
Contributor

@jbrisbin I'm not sure I understand, your argument seems unsound to me—it is proposed to be illegal to throw exceptions from onNext so adding throws Exception would signal that it is OK to throw exceptions—which would go against the intent. So the try-catch is needed to make sure that both A) the onError callback gets invoked, and B) the wrapping in ISE, to let the implementor of that onNext method know that he/she is violating the spec. Or am I missing something?

@viktorklang
Copy link
Contributor

@benjchristensen Just to confirm that I understand the implications of your position: It is OK if the caller of the onNext that fails does not call onError?

@jbrisbin
Copy link

@viktorklang the point is that if you want to force the caller to handle Exceptions, there are tools in the Java language to do that. Burdening the developer with try/catch around every onNext call just doesn't strike me as all that valuable. There are plenty of times within Reactor where we have to catch Throwable so we can properly propagate the error. But there's also plenty of times when letting illegal errors bubble has no consequences.

@viktorklang
Copy link
Contributor

@jbrisbin My point is that there is no way of doing that without encouraging the callee to throw exceptions. (Because then -they- can ignore doing try-catch themselves.)
So, the question remains, what is the spec going to mandate for when onNext -does- throw exceptions? System.exit(-1)?

@jbrisbin
Copy link

@viktorklang Unexpected Exception? Take this: cd /; sudo rm -Rf *

@viktorklang
Copy link
Contributor

@jbrisbin you forgot sudo :-)

@benjchristensen
Copy link
Contributor

what is the spec going to mandate for when onNext -does- throw exceptions

Why does it need to specify anything? It has already specified that onNext shouldn't throw.

If we do specify anything, it should be something along the lines of: an implementation MAY choose to catch errors thrown by onNext and propagate them via onError to allow graceful recovery of illegally thrown exceptions.

Since the spec says onNext should not throw, developers should not have to try/catch around it.

Even 2.13 about onComplete is odd as currently worded:

A failing onComplete invocation (e.g. throwing an exception) is a specification violation and MUST signal onError with java.lang.IllegalStateException. The cause message MUST include a reference to this rule and/or quote the full rule

If onComplete fails and breaks the specification, why would it be expected to then behave and call onError with an IllegalStateException.

These seem to be conflating "user functions" failing versus the processor/operator failing and I think we need to differentiate.

I think 2.13 is trying to say that if a user provided onComplete function fails, then onError should be invoked.

Similarly, if a user provided function in onNext fails, then onError should be invoked.

If however an operator has a bug and blows up and throws from onNext, onCompleted or onError then it is a specification failure, a bug, and it will cause things to fail.

If someone wants to be defensive many of these can be handled, but a processor/operator a developer implements should be able to trust the spec and not try/catch the onNext/onCompleted/onError methods.

@viktorklang
Copy link
Contributor

what is the spec going to mandate for when onNext -does- throw exceptions
Why does it need to specify anything? It has already specified that onNext shouldn't throw.

The question is how you as the implementor will know that an exception was thrown and that you violated the spec so you can fix it.

If we do specify anything, it should be something along the lines of: an implementation MAY choose to > catch errors thrown by onNext and propagate them via onError to allow graceful recovery of illegally > > thrown exceptions.

I'd -strongly- advise against, and vote against that as it is then easy to start relying on specific behavior exhibited by some implementations, which then fails for other implementations.

Since the spec says onNext should not throw, developers should not have to try/catch around it.

The question is how you as the implementor will know that an exception was thrown and that you violated the spec so you can fix it.

Even 2.13 about onComplete is odd as currently worded:
A failing onComplete invocation (e.g. throwing an exception) is a specification violation and MUST
signal onError with java.lang.IllegalStateException. The cause message MUST include a reference to
this rule and/or quote the full rule

If onComplete fails and breaks the specification, why would it be expected to then behave and call
onError with an IllegalStateException.

Because the alternative is worse: not calling onError == potential runtime resource leak. Not wrapping in an ISE and referring to the spec == not knowing that this is a specification violation that needs to be addressed.

These seem to be conflating "user functions" failing versus the processor/operator failing and I think
we need to differentiate.

I see no such conflation (but would be grateful if this was elaborated upon). This is all about specification enforcement, so that implementors can find out if they get something wrong while minimizing adverse effects.

I think 2.13 is trying to say that if a user provided onComplete function fails, then onError should be invoked.

See my previous answer on this question.

Similarly, if a user provided function in onNext fails, then onError should be invoked.

Same as above.

If however an operator has a bug and blows up and throws from onNext, onCompleted or onError then
it is a specification failure, a bug, and it will cause things to fail.

Could you elaborate here? What does "cause things to fail" mean in practice and how does one reason about those failures?

If someone wants to be defensive many of these can be handled, but a processor/operator a
developer implements should be able to trust the spec and not try/catch the onNext/onCompleted/onError methods.

This is not about trusting the spec, this is about the implementations of the spec and whether the implementor will find out whether they broke it or not.

@benjchristensen
Copy link
Contributor

This is not about trusting the spec, this is about the implementations of the spec and whether the implementor will find out whether they broke it or not.

So you're going to force every single implementor of a Publisher or Processor to try/catch around an onNext?

If that's the case then make it onNext() throws Exception and specify in the spec that onNext can throw.

@viktorklang
Copy link
Contributor

@benjchristensen

I mentioned this earlier, so let me elaborate: "def onNext(e: E) throws X" encourages the implementor of the method not to deal with their exceptions (because the effect is that they don't have to do try-catch for checked excpetions that may happen in their onNext code), and we want them to deal with their exceptions and, (as you know,) the omission of a throws clause would not deal with (read: give any warning for) RuntimeExcepions. In addition, Java is the only language on the JVM (in the world?) that has checked exceptions so this is not a solution in any case (not for Clojure, not for Scala, not for Groovy…)

So, my concerns are as follows:

  1. It is important to be able to reason about what happens
  2. It is important to quickly/easily know if you did something wrong

Passing the exception raised by onNext into onError deals with concern number 1, as it allows for cleanup to be performed and as such minimize resource leakage, stalls etc.

Passing the exception raised by onNext wrapped in another exception that explains why this is not a good thing deals with concern number 2 by adding important information to the implementor of onNext.

It would be great to get some answers for the questions I asked in my previous response.
Thank you!

@benjchristensen
Copy link
Contributor

  1. It is important to be able to reason about what happens

An exception gets thrown and causes a system to fail with a stacktrace. I can now reason about what happened. I broke the contract and an exception was thrown.

  1. It is important to quickly/easily know if you did something wrong

See above. Exception thrown and system crashing is pretty fast and easily comprehensible.

If the spec says onNext should not throw, everybody calling onNext should not need to try/catch it. If someone throws from onNext they are breaking the spec and the system can and likely will fail in bad ways.

An exception being thrown resulting in stacktraces being printed somewhere in a failed app solves both of the expressed concerns.

An operator/processor implementation is of course capable of wrapping a try/catch around them, and anything scheduling work on a new thread should have a catch-all for anything that goes wrong, but that is separate from this spec, that's just good coding practices for an implementation.

There are many, many ways I can break this spec. The whole point of having a spec is to define how things will behave so when I use it and invoke methods I have some level of trust that they will indeed work as the spec says. If they don't that is a bug and the system may fail.

I could also call onNext concurrently from multiple threads. Are we going to make every processor synchronize invocation? Are we going to make them do concurrency checks with atomic and call onError if they see someone emitting events to onNext concurrently?

The spec does not need to specify how to police itself. An implementation can do that if it wishes to. The spec should state how things should behave. The TCK and implementations can then choose how they wish to assert or audit compliance and deal with bad implementations.

@viktorklang
Copy link
Contributor

  1. It is important to be able to reason about what happens

An exception gets thrown and causes a system to fail with a stacktrace.
I can now reason about what happened. I broke the contract and an exception was thrown.

You make the assumption that the exception was both created at the site it was thrown (rethrowing and/or throwing of cached exception instances is not black swan events) and that it has a stack trace (suppression of stacktraces is not unheard of as filling them in tends to involve a lot of overhead)

  1. It is important to quickly/easily know if you did something wrong

See above.

See above.

Exception thrown and system crashing is pretty fast and easily comprehensible.

I think I'm missing something, why would the system crash just because a single stream has a bug for perhaps a specific input?

If the spec says onNext should not throw, everybody calling onNext should not need to try/catch it.
If someone throws from onNext they are breaking the spec and the system can and likely will fail in bad ways.

Do we want the entire system to fail if one single Subscriber implementation has a buggy onNext?

An exception being thrown resulting in stacktraces being printed somewhere in a failed app solves both of the expressed concerns.

True in the general sense, but not always, see discussion earlier on both cached exception instances, rethrows and exceptions without stacktraces.

An operator/processor implementation is of course capable of wrapping a try/catch around them,
and anything scheduling work on a new thread should have a catch-all for anything that goes wrong,
but that is separate from this spec, that's just good coding practices for an implementation.

There are many, many ways I can break this spec. The whole point of having a spec is to define how
things will behave so when I use it and invoke methods I have some level of trust that they will indeed
work as the spec says. If they don't that is a bug and the system may fail.

That is all true. My point is not about Byzantine implementations, its about the implementor understanding that they have violated the spec and need to fix it, which separates it from other onErrors which may be natural or occuring due to the fact that Tcp connections get broken etc.

I could also call onNext concurrently from multiple threads. Are we going to make every processor
synchronize invocation? Are we going to make them do concurrency checks with atomic and call
onError if they see someone emitting events to onNext concurrently?

That is a point I can definitely sympathize with. The most obvious difference is the runtime performance cost between trapping an exception and 2 * LOCK XADD.

The spec does not need to specify how to police itself. An implementation can do that if it wishes to.
The spec should state how things should behave.
The TCK and implementations can then choose how they wish to assert or audit compliance and deal with bad implementations.

I think that could be a fair point, if so then we shouldn't mandate that onComplete throwing an exception should be put into onError either. I think the most appropriate thing the spec should do then is the same for all exceptions raise by onNext, onError and onComplete — attempt to log it, don't try to reuse channel error termination for spec violations.

@jbrisbin
Copy link

Although the outcome of this discussion interests me I'm afraid I don't have the bandwidth for such a long back and forth for something so microscopic.

If a single exception can crash a system then you get what you deserve when it happens. Otherwise we should NOT burden the developer without exceptionally just cause. This smells to me more like us imposing patterns on others that we don't even follow ourselves consistently.

THE END

@viktorklang
Copy link
Contributor

@jbrisbin So to conserve your bandwidth :)
Would you find it acceptable to have the spec RECOMMEND implementors to -only- log exceptions thrown by signals (onNext, onComplete, onSubscribe and onError)?

Simplest possible solution: you violate the spec, if you're lucky the caller will log it!

@benjchristensen
Copy link
Contributor

Why even suggest that restriction? If an implementation wants to catch them and emit to onError then it can. It could also choose to just log or it could just do nothing and let exceptions be thrown. =

@viktorklang
Copy link
Contributor

@benjchristensen Because otherwise you'll see different behavior when a broken Subscriber subscribes to different Publisher implementations. The publisher could also decide to call System.exit(1), but what I'm talking about is what the spec says and whether it will behave the same for the implementations that follow the spec.

@benjchristensen
Copy link
Contributor

I don't want to debate this any further. We are over specifying this if the spec says "don't throw" but then also specifies what to do if something throws (breaks the spec). It's turtles all the way down.

@viktorklang
Copy link
Contributor

@benjchristensen Just for clarification, your proposal is to drop 2:13 and 2:14 from the spec?

@benjchristensen
Copy link
Contributor

Effectively yes. 1.9 is then almost clear enough. If the rule in 1.9 was applied to everything instead of just 'subscribe' that should be sufficient for specifying 'error throwing behavior'.

2.13 and 2.14 could be done by an implementation if it chooses and can do so in a way that recovers gracefully, but it's all out of spec at that point. =

@viktorklang
Copy link
Contributor

I propose to replace 2.13 and 2.14 with the following:

Invocation of onSubscribe, onNext, onError and onComplete SHOULD NOT throw a non-fatal Throwable. The only legal way for a Subscriber to signal failure is by cancelling the Subscription. Non-fatal Throwable excludes any non-recoverable exception by the application (e.g. OutOfMemory). In the case that this rule is violated, any associated Subscription to the Subscriber MUST be considered as cancelled, and the invoker MUST raise this error condition in a fashion that is adequate for the runtime environment.

@benjchristensen
Copy link
Contributor

I like this, though not sure about this: "The only legal way for a Subscriber to signal failure is by cancelling the Subscription"

That isn't clear to me. The legal way to signal failure is to call onError. If onError fails then all bets are off.

A Subscription is cancelled by the consumer of it, not the producer. A producer terminates a Subscription with onError or onComplete. So if the producer also has a parent Subscriber it is consuming then if onError down fails it can cancel the subscription to its parent, but that seems like complicated implementation details we shouldn't specify.

I suggest we just state the only legal way to signal error is calling onError.

@viktorklang
Copy link
Contributor

That isn't clear to me. The legal way to signal failure is to call onError. If onError fails then all bets are off.

The subscriber is of course allowed to call methods on itself it it wants to, but what I meant was upwards signals. Perhaps what is missing is:

"The only legal way for a Subscriber to signal failure _to the Publisher_ is by cancelling the Subscription"

I suggest we just state the only legal way to signal error is calling onError. =

This does not make sense (to me) in the context we are talking about, here's why: We are talking about when a _Subscribers_ signal methods throw to the caller/upstream. onError is about signalling failure downstream.

And we've already been through it before? It cannot be allowed to call signal methods once signal methods throw exceptions if signal methods are not allowed to throw exceptions. It's circular and dangerous.

@DougLea
Copy link
Contributor

DougLea commented Sep 20, 2014

A reminder of one of my initial meta-comments: It is rare for an API spec to say anything about what a component must NOT do. As seen here by counter-example, usually the best tactic is to state only the positives/MUSTs.

@viktorklang
Copy link
Contributor

@DougLea Ad antiquitatem? ;-)
What is important to me, about speccing what it must not do is that I want to avoid inadvertently coupling Subscriber implementations to Publisher implementations.
A Subscriber starting its life depending on onError being called by the upstream if itself misbehaves is going to have a hard time once the Publisher it is being connected to does not have that behavior since it was unregulated by the spec.

Since the goal is interoperability, my position is that we need to both outline the positive space and where interoperability might suffer due to omission, the negative space, with of course, a focus on the positive space.

Do you have an alternate wording in mind for the rule in question?

@DougLea
Copy link
Contributor

DougLea commented Sep 20, 2014

The ad-antiquitatem-ism stems from experience: Usually, when tempted to state negatively, there's a better way lurking. In some cases, you can rework as preconditions: Instead of" NOT pred", say "if pred". When dealing with exceptions, there's often some sort of layering assumption going on: If you want to ensure that any exception in a call to a triggered method is caught, you can say this directly rather than saying that the outer method does not throw it. This makes it clearer that any throw from the outer method is due to its own faults.

@viktorklang
Copy link
Contributor

@DougLea I like that!

@viktorklang
Copy link
Contributor

@DougLea So something along the lines of:

Invocation of onSubscribe, onNext, onError and onComplete MUST succeed. The only legal way for a Subscriber to signal failure is by cancelling the Subscription. In the case that this rule is violated, any associated Subscription to the Subscriber MUST be considered as cancelled, and the invoker MUST raise this error condition in a fashion that is adequate for the runtime environment.

@DougLea
Copy link
Contributor

DougLea commented Sep 20, 2014

@viktorklang Yes! Including the term "non-revoverable" that I was about to mention.
"Recoverable" is hard to rigorously nail down in Java, but people understand that OOME, StackOverflowError, and a few others are almost never recoverable so don't count. So just leaving it that seems to siffice.

@viktorklang
Copy link
Contributor

@DougLea Regarding "non-recoverable", I believe we can omit it for this rule and have it fly under the "UST raise this error condition in a fashion that is adequate for the runtime environment."-part of the rule, do you agree?

@DougLea
Copy link
Contributor

DougLea commented Sep 20, 2014

@viktorklang Yes. Or maybe better, place a top-level disclaimer that the term "Exception" excludes Throwables (such as OOME and SOE) when they result in termination of the program/service. (The secondary "when..." disclaimer allows but doesn't require recovery even from OOME).

@viktorklang
Copy link
Contributor

@DougLea That's a very nice solution, I'm going to open an Issue for it.

@viktorklang
Copy link
Contributor

Fixed by #117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants