-
Notifications
You must be signed in to change notification settings - Fork 93
azure-iot-arduino fails IoTHub connection after a couple hours #30
Comments
After further testing using the 0.11.0 version of the WiFi101 client I have traced the lockup (solid activity LED) problem down to an infinite loop in SSL send function. I don’t see this as a direct defect in the azure-iot-arduino library, but the library’s actions are hitting a design constraint and\or defect with the WINC1500 code found inside WiFi101 (and the Adafruit WINC1500 library). I haven’t been able to trace down any Atmel documentation, but it appears that the WINC1500 supports 16 outstanding transmits. The azure-iot-arduino library performs 18 sslClient_Writes() to complete an event send, this results in the TCP send window closing at the CRLF sent between the headers and the body; until outstanding sends are deemed complete via the send callback. Typically the WiFi client waits appropriately for the send callbacks in order to finish sending the event POST. However, under some conditions the WINC deadlocks never processing send callbacks and consequently never returning to the task of sending out TCP frames. The WiFi101\WINC code in question is within the WINC hif_send() (in m2m_hif.c) and WiFiClient::write() (in WiFiClient.cpp) functions. This logic is waiting for a dma_addr (aka send buffer) from the WINC chip. I have implemented a work around in our test environment, which consolidates the header and trailing CRLF writes inside SendHeadsToXIO() (in adapter\httpapi_compact.c) to a single send header + CRLF. By reducing the sends per event from 18 down to 11 the WINC code has now been running for over 75 hours, previous to the workaround never made it 8 hours. |
Our work around (below) initially solved the issue, however as I added back out user properties I discovered these are sent over the wire as additional HTTP headers, resulting in more than 16 frames to send an event and a reappearance of the issue. Workaround...
|
Closing... this issue was rooted within the WiFi code. The AzureIotHub was able to cause the WiFi code to go into a tight retry loop after exhausting the send buffers within the WINC1500 chip. Under certain conditions, still unknown, the WINC1500 driver would stop attempting to service the chip and deadlock waiting for a send buffer. A pending pull request https://github.com/arduino-libraries/WiFi101/pull/77 provides a better retry mechanism to the WINC no send buffer condition. My application can successfully run for, so far 24+ hours, with this WiFi101 code. |
I am is working on an embedded solution targeting Arduino architecture and WINC1500 (using the Adafruit Feather M0 with WINC1500 for our reference build). My application code works fine for several hours then fails with one of two scenarios… 1) the WINC1500 transmit LED (yellow) goes on solid; or 2) the connection to the IoT Hub is lost and a sequence of errors is produced (exact sequence depends on which WINC client: Adafruit_WINC1500 or WiFi101).
I can reproduce the issue with the Remote_Monitoring example provided in the Getting Started kit at https://github.com/Azure-Samples/iot-hub-c-m0wifi-getstartedkit and I have moved all testing from my code to the remote_monitoring example.
Scenario #2 errors with Adafruit_WIN1500 and AzureIoTHub (0.2.0)
AzureIoTHub\src\httpapi.cpp Func:HTTPAPI_CreateConnection Line:39 HTTPS connection to loouq-iotos-preview.azure-devices.net failed
AzureIoTHub\src\sdk\httpapiex.c Func:HTTPAPIEX_ExecuteRequest Line:476 unable to recover sending to a working state
AzureIoTHub\src\sdk\iothubtransporthttp.c Func:DoEvent Line:1223 unable to HTTPAPIEX_ExecuteRequest
Scenario #2 errors with WINC1500 and azure-iot-arduino (master - Latest commit df88a5a on Oct 14)
azure-iot-arduino-master\src\adapters\tlsio_arduino.c Func:tlsio_arduino_create Line:159 Host loouq-iotos-preview.azure-devices.net not found.
azure-iot-arduino-master\src\adapters\httpapi_compact.c Func:HTTPAPI_CreateConnection Line:221 Create connection failed
azure-iot-arduino-master\src\azure_c_shared_utility\httpapiex.c Func:HTTPAPIEX_ExecuteRequest Line:478 unable to recover sending to a working state
azure-iot-arduino-master\src\sdk\iothubtransporthttp.c Func:DoEvent Line:1582 unable to HTTPAPIEX_SAS_ExecuteRequest
I also noted a huge slowdown in the IoTHubClient_LL_DoWork() method with the newer azure-iot-arduino, from a about 10 millis to a range of 4 to 9 seconds.
I tried switching to the WiFi101 driver (both 0.10.0 and github master latest as of 11/13/2016) with the same results. I attempted to use the WiFi101 driver due to the statement on the Adafruit WINC1500 github repo… “This fork is deprecated! Please use WiFi101, which now has pin-setting support Thak you! :)” and has not had commits since Sep 22nd.
My questions…
Thanks,
Greg
The text was updated successfully, but these errors were encountered: