Skip to content

ESP8266httpUpdate.update() incompatible with nginx #1157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
skandragon opened this issue Dec 6, 2015 · 11 comments
Closed

ESP8266httpUpdate.update() incompatible with nginx #1157

skandragon opened this issue Dec 6, 2015 · 11 comments

Comments

@skandragon
Copy link

I am attempting to perform an upgrade from a server running nginx, and am having no real luck. It appears only some of the data is being transferred to the ESP, and I'm quite sure it's not a problem with nginx:

10.42.0.47 - - [06/Dec/2015:03:42:40 +0000] "GET /explorer/esp.bin HTTP/1.1" 200 69819 "-" "ESP8266-http-Update"
10.42.0.47 - - [06/Dec/2015:03:43:10 +0000] "GET /explorer/esp.bin HTTP/1.1" 200 28939 "-" "ESP8266-http-Update"
10.42.0.47 - - [06/Dec/2015:03:47:26 +0000] "GET /explorer/esp.bin HTTP/1.1" 200 71279 "-" "ESP8266-http-Update"
10.42.0.47 - - [06/Dec/2015:03:52:02 +0000] "GET /explorer/esp.bin HTTP/1.1" 200 28939 "-" "ESP8266-http-Update"
10.42.0.47 - - [06/Dec/2015:05:32:51 +0000] "GET /explorer/esp.bin HTTP/1.1" 200 228000 "-" "ESP8266-http-Update"
10.42.0.47 - - [06/Dec/2015:05:33:43 +0000] "GET /explorer/esp.bin HTTP/1.1" 200 202679 "-" "ESP8266-http-Update"
10.42.0.47 - - [06/Dec/2015:05:34:43 +0000] "GET /explorer/esp.bin HTTP/1.1" 200 28939 "-" "ESP8266-http-Update"
10.42.0.47 - - [06/Dec/2015:05:41:08 +0000] "GET /explorer/esp.bin HTTP/1.1" 200 28939 "-" "ESP8266-http-Update"

Here's a tcpdump of the last connection shown:

05:41:08.622315 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [S], seq 16134, win 5840, options [mss 1460], length 0
05:41:08.622364 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [S.], seq 2641987090, ack 16135, win 29200, options [mss 1460], length 0
05:41:08.624701 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [.], ack 1, win 5840, length 0
05:41:08.626063 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [P.], seq 1:341, ack 1, win 5840, length 340
05:41:08.626097 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [.], ack 341, win 30016, length 0
05:41:08.626264 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [.], seq 1:1461, ack 341, win 30016, length 1460
05:41:08.626293 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [.], seq 1461:2921, ack 341, win 30016, length 1460
05:41:08.626323 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [.], seq 2921:4381, ack 341, win 30016, length 1460
05:41:08.626343 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [.], seq 4381:5841, ack 341, win 30016, length 1460
05:41:08.631852 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [.], ack 2921, win 2920, length 0
05:41:08.634229 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [.], ack 5841, win 0, length 0
05:41:08.730753 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [.], ack 5841, win 5840, length 0
05:41:08.730778 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [P.], seq 5841:7301, ack 341, win 30016, length 1460
05:41:08.730789 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [.], seq 7301:8761, ack 341, win 30016, length 1460
05:41:08.730799 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [.], seq 8761:10221, ack 341, win 30016, length 1460
05:41:08.730807 IP 10.42.120.1.80 > 10.42.0.47.4099: Flags [P.], seq 10221:11681, ack 341, win 30016, length 1460
05:41:08.730814 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [F.], seq 341, ack 5841, win 5840, length 0
05:41:08.734321 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [R.], seq 342, ack 7301, win 5840, length 0
05:41:08.735513 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [R.], seq 341, ack 8761, win 5840, length 0
05:41:08.735533 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [R.], seq 341, ack 10221, win 5840, length 0
05:41:08.737900 IP 10.42.0.47.4099 > 10.42.120.1.80: Flags [R.], seq 341, ack 11681, win 5840, length 0

It works exactly once after a fresh reboot, but subsequent upgrade attempts fail even if that one succeeds.

I've attached a working tcpdump to https://gist.github.com/skandragon/47d4769ff988a3e1ce65 rather than put it inline here.

@skandragon
Copy link
Author

Compiled with UPDATE_SERIAL Serial:

Success (partial log): https://gist.github.com/skandragon/3b22582d3138009cf436

Failure (full log): https://gist.github.com/skandragon/7307d615b20c123ccfab

@skandragon
Copy link
Author

Printing the error:

[update] Update failed.
ERROR[5]: Stream Read Timeout

I do not see any IP level fragmentation, so that's not an issue.

@skandragon
Copy link
Author

I am starting to think the general state of TCP connections in ESPland is somewhat broken.

My ultimate goal is to receive RGB data to drive LED arrays, and this will have to come in at a fairly high data rate.

If I limit my data rate to one update per 0.5 seconds (each update consists of 6 bytes of data), then the ESP has no issue. My read is basically this:

read a single byte command.
If it's an LED update command, read 5 more bytes using client.read(buffer, sizeof buffer) where buffer is a char buffer[5].

With a low data rate, this works fairly well, but it will even then eventually have a short read. If I send at a high data rate, I can cause a short read to occur nearly instantly. In all cases, only 1 byte of the 5 is read, according to a printf().

The connection is entirely over a stable wifi connection; there should be no AP switching or dropping, but I'm not positive the ESP is as stable as I'd like it to be here. It is possible it's reconnecting, but I have no idea how to tell this.

@igrr
Copy link
Member

igrr commented Dec 6, 2015

Socket read can return less bytes than you requested, there is nothing
wrong with it.
If you want "wait until N bytes are available or timeout" behavior, there's
Stream::readBytes function for that.

In any case, please open a separate issue if you find short reads a
problem, and post the sketch which may be used to reproduce.
Thanks.

On Sun, Dec 6, 2015, 11:49 Michael Graff [email protected] wrote:

I am starting to think the general state of TCP connections in ESPland is
somewhat broken.

My ultimate goal is to receive RGB data to drive LED arrays, and this will
have to come in at a fairly high data rate.

If I limit my data rate to one update per 0.5 seconds (each update
consists of 6 bytes of data), then the ESP has no issue. My read is
basically this:

read a single byte command.
If it's an LED update command, read 5 more bytes using client.read(buffer,
sizeof buffer) where buffer is a char buffer[5].

With a low data rate, this works fairly well, but it will even then
eventually have a short read. If I send at a high data rate, I can cause a
short read to occur nearly instantly. In all cases, only 1 byte of the 5 is
read, according to a printf().

The connection is entirely over a stable wifi connection; there should be
no AP switching or dropping, but I'm not positive the ESP is as stable as
I'd like it to be here. It is possible it's reconnecting, but I have no
idea how to tell this.


Reply to this email directly or view it on GitHub
#1157 (comment).

@Links2004
Copy link
Collaborator

the ESP8266httpUpdate class itself handle only the requesting and header Interpretation.
the data transmission itself is handled in the Update class.
which uses readBytes, the time out for the read is 1000ms.

you can try to enable the debug:
https://github.com/esp8266/Arduino/blob/master/libraries/ESP8266httpUpdate/src/ESP8266httpUpdate.h#L35
https://github.com/esp8266/Arduino/blob/master/cores/esp8266/Updater.cpp#L6

I think there are to possible reasons.

  • the reported Content-Length in the header is wrong (easy to check with debug on)
  • for some reasons a timeout course (then its more the TCP part of the ESP)

@skandragon
Copy link
Author

Looking more closely at the tcpdump when this first fails (not subsequent failures, which are tainted by left over internal state somewhere inside the ESPhttpUpdate class) I see this pattern:

Server sends some number of bytes (1460 bytes usually)
ESP ACKs
Server sends more
ESP ACKs previous bytes again, window size unchanged.
ESP ACKs previous bytes again, window size unchanged.
Server sends more
ESP ACKs the same bytes again, window size unchanged.
<at this point the server thinks the ESPs window is full, so waits, until...>
Server retransmits from the last ACK
ESP ACKs new data, and data flows again

This happens every time there is an issue. It can take a while for the server to re-transmit the un-ack'd data, usually longer than the timeout for the read call, is my guess.

I'm running a 10x second ping to the ESP, and I see no dropped packets. No other device, wifi or ethernet connected, has issues talking with the server running the nginx server. Local connectivity does not seem to be an issue.

This is a tcpdump running from the wifi access point itself:

23:36:18.523452 IP 10.42.120.1.80 > 10.42.0.47.4097: Flags [.], seq 49641:51101, ack 343, win 30016, length 1460
23:36:18.528178 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [.], ack 45261, win 5840, length 0
23:36:18.528194 IP 10.42.120.1.80 > 10.42.0.47.4097: Flags [.], seq 45261:46721, ack 343, win 30016, length 1460
23:36:18.528204 IP 10.42.120.1.80 > 10.42.0.47.4097: Flags [.], seq 46721:48181, ack 343, win 30016, length 1460
23:36:18.643764 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [.], ack 46721, win 4380, length 0
23:36:18.643786 IP 10.42.120.1.80 > 10.42.0.47.4097: Flags [.], seq 48181:49641, ack 343, win 30016, length 1460
23:36:18.650908 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [.], ack 48181, win 4380, length 0
23:36:18.650926 IP 10.42.120.1.80 > 10.42.0.47.4097: Flags [P.], seq 51101:52561, ack 343, win 30016, length 1460
23:36:18.650934 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [.], ack 48181, win 5840, length 0
23:36:18.655676 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [.], ack 49641, win 4380, length 0
23:36:18.655692 IP 10.42.120.1.80 > 10.42.0.47.4097: Flags [.], seq 52561:54021, ack 343, win 30016, length 1460
23:36:18.789139 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [.], ack 49641, win 4380, length 0
23:36:18.789156 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [.], ack 49641, win 5840, length 0
23:36:28.789872 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [F.], seq 343, ack 49641, win 5840, length 0
23:36:28.828208 IP 10.42.120.1.80 > 10.42.0.47.4097: Flags [.], ack 344, win 30016, length 0
23:36:29.556215 IP 10.42.120.1.80 > 10.42.0.47.4097: Flags [.], seq 49641:51101, ack 344, win 30016, length 1460
23:36:29.558423 IP 10.42.0.47.4097 > 10.42.120.1.80: Flags [R.], seq 344, ack 51101, win 5840, length 0

@zenmanenergy
Copy link

I'm having the same problem on a php-apache server. I've traced it down to this same line number in updater.cpp. The solution is over my head, but I can reproduce the problem 100% of the time.

@zenmanenergy
Copy link

I've made some interesting progress...

Try increasing the stream timeout to 10000 in writeStream in Updater.cpp

size_t UpdaterClass::writeStream(Stream &data) {
data.setTimeout(10000);

@Links2004
Copy link
Collaborator

I have done some changes to the Update class to give the IP stack more time to handle the data.
can you please retest it.

Links2004 added a commit to Links2004/Arduino that referenced this issue Dec 10, 2015
@Links2004 Links2004 mentioned this issue Dec 10, 2015
@zenmanenergy
Copy link

I have tested this new fix. for the most part it is working well. Thank you Links! The is still an issue with ESP.restart() seems to restart in bootloader mode. So it seems to require a manual reboot.

@Links2004
Copy link
Collaborator

the restart is a other problem see #1017 , but there is no solution yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants