Skip to content

AutorecoveringConnection fires client shutdown hooks after connection has recovered #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
doswell opened this issue Mar 8, 2016 · 18 comments

Comments

@doswell
Copy link

doswell commented Mar 8, 2016

Version - amqp-client-3.6.1

When init is called on AutorecoveringConnection it adds the automatic recovery listener to the shutdownHooks list - Because this is the first entry, any user added shutdown hooks will be called after the automatic recovery.

as a possible solution;
The listener created in addAutomaticRecoverListener would likely need to call each shutdownListener, and omit adding the created listener to the shutdownHooks.
addShutdownListener, removeShutdownListener would only apply to the shutdownHooks list and not add to the delegate.
recoverShutdownListeners would just call addAutomaticRecoverListener.

@michaelklishin
Copy link
Contributor

I'd like to see if this is in line with the idea of #132 first.

@michaelklishin
Copy link
Contributor

Also note that shipping this in a 3.6.x release is probably unrealistic.

@doswell
Copy link
Author

doswell commented Mar 8, 2016

I think this fix allows for #132 to be easily done without other API
changes. As now you will get the shutdown listener fired when recovery
starts.
As a side note; Perhaps in terms of the auto recovery connection
shutdownlistener is a slightly misleading name vs disconnect listener.

Currently, if you add a shutdown listener which just logs, and are
connecting to one broker (I suspect if you use a cluster this is less
noticeable as the reconnect happens quickly to another member, so if
logging disconnects it seems correct);

Client Connect to server
Add shutdown listener and recovery listener to log status
(disconnect/connected)
Stop rabbitmq
-- no log output
Wait 5mins

Start rabbitmq
Client reconnects
-- logs output saying connected
-- logs output saying disconnected(for disconnect 5mins ago)

Funny enough the logging order was what I have seen from my tests; I think
the recovery listener ends up being fired before letting the remaining
shutdownlisteners run.

On Tue, Mar 8, 2016, 5:48 PM Michael Klishin [email protected]
wrote:

Also note that shipping this in a 3.6.x release is probably unrealistic.


Reply to this email directly or view it on GitHub
#135 (comment)
.

@michaelklishin
Copy link
Contributor

@doswell ok, thanks for the explanation, the latter point is definitely convincing.

@michaelklishin
Copy link
Contributor

@garyrussell @artembilan @spatula75 do you folks have an opinion on this change? It's breaking but makes sense to me. There is a pull request in #137.

@artembilan
Copy link

@michaelklishin ,
thank you for letting to know!

I would like to see the direct code changes to make a decision. Right now it is difficult to dig over all those GH cross links.
But from other side I believe @garyrussell in his comment: #132 (comment).
So, we really may live silently in Spring AMQP with that patch.

@spatula75
Copy link

Just want to make sure I understand the sequence of events that will happen so I can make an intelligent comment.

It sounds like it would go a bit like this (assuming I've declared a RecoverableConnection):

  1. Connection dies
  2. ShutdownListener is called
  3. Recovery begins
  4. Recovery completes, connection is good again
  5. RecoveryListener is called

Is that correct?

@spatula75
Copy link

(If so, I think it would obsolete my PR, provided that we could tell somehow in step 2 above that recovery will be attempted.)

@michaelklishin
Copy link
Contributor

@artembilan @garyrussell @spatula75 currently shutdown listeners are executed (effectively) concurrently with recovery which is kicked off by another shutdown listener. This PR makes the client first execute all shutdown listeners and then begin recovery.

@spatula75 yes, the sequence is correct. It is sort of the same today but recovery is kicked off by one of the shutdown listeners so it's not clear whether a connection might already be in recovery by the time a user-provided listener is executed.

@spatula75
Copy link

@michaelklishin Cool, then one possible suggestion is to have some way of indicating to the shutdown listener that, though there has been a shutdown, a recovery is starting; then my API breaking PR might not be needed at all (because you'd effectively be getting the "recovery started" event from the shutdown listener instead of having a separate event). The semantics aren't exactly identical though, but similar.

@doswell
Copy link
Author

doswell commented Apr 6, 2016

@michaelklishin When reviewing the recovery code I didn't see where the execution of listeners was concurrent, perhaps I missed something (I believe it was in ShutdownNotifierComponent ). But it looks like they are called blocking / sequentially in the order they are added to the delegate.

@spatula75 Yes there is a bit of an assumption that since it's an AutorecoveringConnection and the shutdown has been fired that recovery will occur next. The exception to that would be if the connection was closed on purpose in which case no recovery would occur. I'll have a look to see if there is a clear way to indicate that recovery will be attempted (Perhaps Extending ShutdownSignalException in some way ?). As under a normal close situation there wouldn't be recovery.

I guess it somewhat depends on the intent of the shutdown listener within an AutorecoveringConnection & whether @spatula75 PR makes sense as well. But I would guess with that PR then calling the shutdown listeners when recovery will occur may not make sense? and should just be called when the connection is truly shutdown and not when recovery is being attempted (that possibly could be confusing for current implementations, which are current doing something when shutdown is called - so that could be a bad idea to change?).,

@michaelklishin
Copy link
Contributor

@doswell they are invoked serially. Shutdown listeners in autorecovering connections have the same purpose as in "regular" connections — add a way for developers to react to connection termination — plus the additional duty of kicking off recovery.

I personally see @spatula75's PR as completely orthogonal to this issue.

@doswell
Copy link
Author

doswell commented Apr 6, 2016

got the changes to merge into the original PR. so closed the new one to keep history.

@michaelklishin
Copy link
Contributor

@doswell cheers.

@michaelklishin
Copy link
Contributor

After reviewing #136 I'd like to try a different approach (closer to what the original version of #136 looked like).

@michaelklishin michaelklishin self-assigned this Apr 13, 2016
michaelklishin added a commit that referenced this issue Apr 13, 2016
…ecuted

Note: we cannot guarantee that they'd have *finished* by the time
recovery starts. Provided that ShutdownListeners are typically
basic sequential functions, that'll be the case most of the
time.

Fixes #135
@punkeel
Copy link

punkeel commented Aug 7, 2016

I am using release 3.6.2 with the auto recovery feature, and I sometimes get errors that cause the Rabbit client to fail.

I do think the patches applied here help in solving my issue, as they (imo) improve the recovery scheme, so I won't create another issue.

Here's the stacktrace I get:

com.rabbitmq.client.impl.DefaultExceptionHandler: Consumer com.rabbitmq.client.QueueingConsumer@bd02d41 (amq.ctag-FZINo96YoZFCkvNm1s1PUg) method handleDelivery for channel AMQChannel(amqp://...@.../,1) threw an exception for channel AMQChannel(amqp://...@.../,1):
 com.rabbitmq.client.ShutdownSignalException: connection error
 at com.rabbitmq.client.QueueingConsumer.checkShutdown(QueueingConsumer.java:177)
 at com.rabbitmq.client.QueueingConsumer.handleDelivery(QueueingConsumer.java:129)
 at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144)
 at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 Caused by: java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:209)
 at java.net.SocketInputStream.read(SocketInputStream.java:141)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
 at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288)
 at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95)
 at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139)
 at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:542)
 ... 1 more

@michaelklishin
Copy link
Contributor

@punkeel QueueingConsumer does not support automatic connection recovery and should be avoided for this reason and others. "Connection reset" means the peer closed or refused client's TCP connection.

@punkeel
Copy link

punkeel commented Aug 8, 2016

@michaelklishin Sorry for my misunderstanding, I've read it in the doc but thought the client library handled most errors anyway.

Let's migrate to BasicConsumer then, thanks and sorry for posting in the wrong place!

stream-iori pushed a commit to stream-iori/rabbitmq-java-client that referenced this issue Mar 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants