-
Notifications
You must be signed in to change notification settings - Fork 15
Consumer not recovering connection after timeout #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you increase the connection timeout? Set up a channel customizer in the Netty configuration of the environment builder, e.g |
Thank you for the suggestion to increase the connection timeout by setting a custom channel option in the Netty configuration The issue is that when this timeout occurs, it seems to cancel the entire recovery process. Is this the intended behavior? Shouldn’t connection-related timeouts trigger a retry or a different recovery mechanism rather than canceling the process entirely? |
@Hendr-ik-a for initial connections, they should not. For recovery of an existing connection, any I/O exception should trigger a retry. At least that's how most RabbitMQ clients approach connection recovery since 2013, when it was introduced in Ruby and Java (AMQP 0-9-1) clients. |
@Hendr-ik-a I pushed a fix, can you try the snapshot? |
Thanks for the fix! I’ll test the snapshot and let you know if the issue is resolved. |
@Hendr-ik-a have you had a chance to test the snapshot? Did it address the issue? |
Describe the bug
Looking at the implementation of ConsumersCoordinator.recoverSubscription method, it seems like the exception is catched by the general Exception catch block where reassignmentCompleted parameter is set to true which ends the recovery process, even though the error message indicates a connection timeout -
Reproduction steps
...
Expected behavior
Client would not stop trying to reconnect the consumer when a timeout occurs.
Additional context
RabbitMQ is running on 3 nodes with a single active consumer. Restarting the nodes works as intended - leader node changes accordingly and consumer recovers after the node restart (Maybe due to shorter timeout period?)
The text was updated successfully, but these errors were encountered: