Skip to content

MQTT infinite reconnect loop after server terminates connection #394

Closed
@danhuang-ixia

Description

@danhuang-ixia

Describe the bug

Seeing an issue when testing corner cases for connection/network loss.
If the server terminates the MQTT connection, the reconnect logic seems to fail forever on AWS_IO_SOCKET_CLOSED.
Calling disconnect/connect directly seems to still reuse the same closed socket, the program needs to be restarted to recover.

Expected Behavior

If a MQTT server terminates the connection, client establishes a new connection.

Current Behavior

on_connection_interrupted callback generates expected error

AWS_ERROR_MQTT_UNEXPECTED_HANGUP: The connection was closed unexpectedly.

DEBUG log level shows reconnection attempt, but fails with "zero read, socket is closed"

[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84020b50: Scheduling mqtt_reconnect task for immediate execution
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel] - id=0x7f1c84001010: destroying channel.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel-bootstrap] - id=0x1e4d6e0: releasing bootstrap reference
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84020b50: Running mqtt_reconnect task with status
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [dns] - id=0x1b58db0: Host resolution requested for 10.36.165.254
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [dns] - id=0x1b58db0: No cached entries found for 10.36.165.254 starting new resolver thread.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c83fff700] [dns] - static: resolving host 10.36.165.254
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c83fff700] [dns] - static: resolved record: 10.36.165.254
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c83fff700] [dns] - static, resolving host 10.36.165.254 successful, returned 1 addresses
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c83fff700] [dns] - static: new address resolved 10.36.165.254 for host 10.36.165.254 caching
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c83fff700] [dns] - static, invoking resolution callback for host 10.36.165.254 with 1 addresses
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c78001580: Scheduling attempt_connection task for immediate execution
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c78001580: Running attempt_connection task with status
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [socket] - id=0x7f1c84000c90 fd=7: initializing with domain 0 and type 0
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [socket] - id=0x7f1c84000c90 fd=7: setting socket options to: keep-alive 0, keep idle 0, keep-alive interval 0, keep-alive probe count 0.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [socket] - id=0x7f1c84000c90 fd=7: beginning connect.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [socket] - id=0x7f1c84000c90 fd=7: connecting to endpoint 10.36.165.254:1883.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c840260b0: Scheduling (null) task for future execution at time 9144044252363669
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84018450: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[INFO] [2023-01-27T15:21:47Z] [00007f1c899c5700] [socket] - id=0x7f1c84000c90 fd=7: connection success
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [socket] - id=0x7f1c84000c90 fd=7: assigning to event loop 0x295a6b0
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel-bootstrap] - id=0x1e4d6e0: client connection on socket 0x7f1c84000c90 completed with error 0.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel] - id=0x7f1c84001010: Beginning creation and setup of new channel.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84000f30: Scheduling on_channel_setup_complete task for immediate execution
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84018450: Running epoll_event_loop_unsubscribe_cleanup task with status
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84000f30: Running on_channel_setup_complete task with status
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel] - id=0x7f1c84001010: setup complete, notifying caller.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel] - id=0x7f1c84001010: message pool 0x7f1c840013a0 found in event-loop local storage: using it.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel-bootstrap] - id=0x1e4d6e0: channel 0x7f1c84001010 setup succeeded: bootstrapping.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [socket-handler] - id=0x7f1c84011be0: Socket handler created with max_read_size of 16384
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84011d40: Scheduling tls_timeout task for future execution at time 9144049253347190
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84000e30: Scheduling socket_written_task task for immediate execution
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84000e30: Running socket_written_task task with status
[INFO] [2023-01-27T15:21:47Z] [00007f1c899c5700] [socket] - id=0x7f1c84000c90 fd=7: zero read, socket is closed
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84001210: Scheduling channel_shutdown task for immediate execution
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel] - id=0x7f1c84001010: Channel shutdown is already pending, not scheduling another.
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [task-scheduler] - id=0x7f1c84001210: Running channel_shutdown task with status
[DEBUG] [2023-01-27T15:21:47Z] [00007f1c899c5700] [channel] - id=0x7f1c84001010: beginning shutdown process

Reproduction Steps

  1. Client establishes a connection with a mosquitto server:
    mqtt_connection = mqtt_connection_builder.mtls_from_path( endpoint=host, port=port, cert_filepath=certificate, pri_key_filepath=privateKey, ca_filepath=rootCA, on_connection_interrupted=on_connection_interrupted, on_connection_resumed=on_connection_resumed, client_id=MQTTclientID, clean_session=False, keep_alive_secs=30, http_proxy_options=proxy_options) connect_future = mqtt_connection.connect() connection_result = connect_future.result()

  2. Restart the mosquitto server (terminates connection abruptly from server end)

Possible Solution

No response

Additional Information/Context

No response

SDK version used

awscrt==0.16.4, awsiotsdk==1.12.2

Environment details (OS name and version, etc.)

Ubuntu 20.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.p2This is a standard priority issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions