Skip to content
This repository was archived by the owner on Jul 22, 2022. It is now read-only.

azure-iot-arduino fails IoTHub connection after a couple hours #30

Closed
GregTerrell opened this issue Nov 14, 2016 · 3 comments
Closed

azure-iot-arduino fails IoTHub connection after a couple hours #30

GregTerrell opened this issue Nov 14, 2016 · 3 comments

Comments

@GregTerrell
Copy link

I am is working on an embedded solution targeting Arduino architecture and WINC1500 (using the Adafruit Feather M0 with WINC1500 for our reference build). My application code works fine for several hours then fails with one of two scenarios… 1) the WINC1500 transmit LED (yellow) goes on solid; or 2) the connection to the IoT Hub is lost and a sequence of errors is produced (exact sequence depends on which WINC client: Adafruit_WINC1500 or WiFi101).

I can reproduce the issue with the Remote_Monitoring example provided in the Getting Started kit at https://github.com/Azure-Samples/iot-hub-c-m0wifi-getstartedkit and I have moved all testing from my code to the remote_monitoring example.

Scenario #2 errors with Adafruit_WIN1500 and AzureIoTHub (0.2.0)
AzureIoTHub\src\httpapi.cpp Func:HTTPAPI_CreateConnection Line:39 HTTPS connection to loouq-iotos-preview.azure-devices.net failed
AzureIoTHub\src\sdk\httpapiex.c Func:HTTPAPIEX_ExecuteRequest Line:476 unable to recover sending to a working state
AzureIoTHub\src\sdk\iothubtransporthttp.c Func:DoEvent Line:1223 unable to HTTPAPIEX_ExecuteRequest

Scenario #2 errors with WINC1500 and azure-iot-arduino (master - Latest commit df88a5a on Oct 14)
azure-iot-arduino-master\src\adapters\tlsio_arduino.c Func:tlsio_arduino_create Line:159 Host loouq-iotos-preview.azure-devices.net not found.
azure-iot-arduino-master\src\adapters\httpapi_compact.c Func:HTTPAPI_CreateConnection Line:221 Create connection failed
azure-iot-arduino-master\src\azure_c_shared_utility\httpapiex.c Func:HTTPAPIEX_ExecuteRequest Line:478 unable to recover sending to a working state
azure-iot-arduino-master\src\sdk\iothubtransporthttp.c Func:DoEvent Line:1582 unable to HTTPAPIEX_SAS_ExecuteRequest

I also noted a huge slowdown in the IoTHubClient_LL_DoWork() method with the newer azure-iot-arduino, from a about 10 millis to a range of 4 to 9 seconds.

I tried switching to the WiFi101 driver (both 0.10.0 and github master latest as of 11/13/2016) with the same results. I attempted to use the WiFi101 driver due to the statement on the Adafruit WINC1500 github repo… “This fork is deprecated! Please use WiFi101, which now has pin-setting support Thak you! :)” and has not had commits since Sep 22nd.

  • (Scenario Update AzureIoT Library with support for more boards #1) - I have traced the solid transmit LED to an endless loop within the WINC socket write function while waiting for the send() loop (no SOCK_ERR_BUFFER_FULL). I know this is WINC driver code, not azure IoT code; I am wondering if this is exasperated by the AzureIoTHub code or a known issue.
size_t WiFiClient::write(const uint8_t *buf, size_t size)
{
	sint16 err;

	if (_socket < 0 || size == 0) {
		setWriteError();
		return 0;
	}

	// Network led ON (rev A then rev B).
	m2m_periph_gpio_set_val(M2M_PERIPH_GPIO16, 0);
	m2m_periph_gpio_set_val(M2M_PERIPH_GPIO5, 0);

	m2m_wifi_handle_events(NULL);

	#ifdef DEBUG_STREAM
		DEBUG_STREAM.print(F("WiFiClient()::write, ready to send: "));
		DEBUG_STREAM.println((char*)buf);
	#endif

	while ((err = send(_socket, (void *)buf, size, 0)) < 0) {

		#ifdef DEBUG_STREAM
			DEBUG_STREAM.print(F("WiFiClient()::write, send error="));
			DEBUG_STREAM.println(err);
		#endif

		// Exit on fatal error, retry if buffer not ready.
		if (err != SOCK_ERR_BUFFER_FULL) {
			setWriteError();
			m2m_periph_gpio_set_val(M2M_PERIPH_GPIO16, 1);
			m2m_periph_gpio_set_val(M2M_PERIPH_GPIO5, 1);
			return 0;
		}
		m2m_wifi_handle_events(NULL);
	}

	#ifdef DEBUG_STREAM
		DEBUG_STREAM.println(F("WiFiClient()::write, send operation completed"));
	#endif
	
	// Network led OFF (rev A then rev B).
	m2m_periph_gpio_set_val(M2M_PERIPH_GPIO16, 1);
	m2m_periph_gpio_set_val(M2M_PERIPH_GPIO5, 1);
			
	return size;
}


My questions…

  • If the Adafruit WINC1500 fork is depreciated, should not the azure-iot-arduino library shift to be using WiFi101? Is this simply waiting for WiFi101 to hit 1.x status?
  • Is there a known, but unreported, issue with the azure-iot-arduino driver failing after several hours?
  • Is there any way to trap these two scenarios so that application code can reset the WINC chip and start processing again?

Thanks,
Greg

@GregTerrell
Copy link
Author

After further testing using the 0.11.0 version of the WiFi101 client I have traced the lockup (solid activity LED) problem down to an infinite loop in SSL send function. I don’t see this as a direct defect in the azure-iot-arduino library, but the library’s actions are hitting a design constraint and\or defect with the WINC1500 code found inside WiFi101 (and the Adafruit WINC1500 library).

I haven’t been able to trace down any Atmel documentation, but it appears that the WINC1500 supports 16 outstanding transmits. The azure-iot-arduino library performs 18 sslClient_Writes() to complete an event send, this results in the TCP send window closing at the CRLF sent between the headers and the body; until outstanding sends are deemed complete via the send callback.

Typically the WiFi client waits appropriately for the send callbacks in order to finish sending the event POST. However, under some conditions the WINC deadlocks never processing send callbacks and consequently never returning to the task of sending out TCP frames. The WiFi101\WINC code in question is within the WINC hif_send() (in m2m_hif.c) and WiFiClient::write() (in WiFiClient.cpp) functions. This logic is waiting for a dma_addr (aka send buffer) from the WINC chip.

I have implemented a work around in our test environment, which consolidates the header and trailing CRLF writes inside SendHeadsToXIO() (in adapter\httpapi_compact.c) to a single send header + CRLF. By reducing the sends per event from 18 down to 11 the WINC code has now been running for over 75 hours, previous to the workaround never made it 8 hours.

@GregTerrell
Copy link
Author

Our work around (below) initially solved the issue, however as I added back out user properties I discovered these are sent over the wire as additional HTTP headers, resulting in more than 16 frames to send an event and a reappearance of the issue.

Workaround...

//LOOUQ: compress header and trailing endline into one send()

if(((ret = snprintf(buf, sizeof(buf), "%s \r\n", header)) < 0) || (ret >= sizeof(buf)))
{
	/*Codes_SRS_HTTPAPI_COMPACT_21_027: [ If the HTTPAPI_ExecuteRequest cannot create a buffer to send the request, it shall not send any request and return HTTPAPI_STRING_PROCESSING_ERROR. ]*/
	result = HTTPAPI_STRING_PROCESSING_ERROR;
}
else 
{
	if (xio_send(http_instance->xio_handle, (const unsigned char*)buf, strlen(buf), NULL, NULL) != 0)
	{
	/*Codes_SRS_HTTPAPI_COMPACT_21_028: [ If the HTTPAPI_ExecuteRequest cannot send the request header, it shall return HTTPAPI_HTTP_HEADERS_FAILED. ]*/
		result = HTTPAPI_SEND_REQUEST_FAILED;
	}
}

// if (xio_send(http_instance->xio_handle, (const unsigned char*)header, strlen(header), NULL, NULL) != 0)
// {
// 	/*Codes_SRS_HTTPAPI_COMPACT_21_028: [ If the HTTPAPI_ExecuteRequest cannot send the request header, it shall return HTTPAPI_HTTP_HEADERS_FAILED. ]*/
// 	result = HTTPAPI_SEND_REQUEST_FAILED;
// }
// if (xio_send(http_instance->xio_handle, (const unsigned char*)"\r\n", (size_t)2, NULL, NULL) != 0)
// {
// 	/*Codes_SRS_HTTPAPI_COMPACT_21_028: [ If the HTTPAPI_ExecuteRequest cannot send the request header, it shall return HTTPAPI_HTTP_HEADERS_FAILED. ]*/
// 	result = HTTPAPI_SEND_REQUEST_FAILED;
// }

@GregTerrell
Copy link
Author

Closing... this issue was rooted within the WiFi code. The AzureIotHub was able to cause the WiFi code to go into a tight retry loop after exhausting the send buffers within the WINC1500 chip. Under certain conditions, still unknown, the WINC1500 driver would stop attempting to service the chip and deadlock waiting for a send buffer.

A pending pull request https://github.com/arduino-libraries/WiFi101/pull/77 provides a better retry mechanism to the WINC no send buffer condition. My application can successfully run for, so far 24+ hours, with this WiFi101 code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant