-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Again: TCP server performance completely unuseful #1430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Well, after a lot of try and error I seem to have found a workaround. It partly was not my idea but I also don't remember where I read the hint. Here goes:
I have no idea what this delay is for and if other parts rely on the I still lost some packets, particularly right after connection establishment, therefore I also increased the TCP send buffer here:
Well, hope this helps others as well. And again, maybe somebody could elaborate on what these 5000ms delay were for. Cheers. |
_send_waiting was my suggestion. Waiting on ivan, markus etc, to review the idea for inclusion. In the mean time, a complete library pfodESP2866WiFi with examples is available from |
|
the arduino API is sync so if you write you get a return how many bytes are written. will add some more debug there, hope this help to find the reason for the delay problem. |
please use my debug branch and enable debug for Core:
or
|
@drmpf: Ah thanks, I remember now, yes that's the post I read about the delay. I tried your code and while the write() returns almost immediately and sends packets as required only, there was still the issue that under Windows new input sent to the ESP would not be processed right away. In short:
Therefore I didn't follow this further. @Links2004: could you specify exactly what you need? My git and the web interface on github does not show me a branch "debug", I only have |
its not in the esp8266 git, its a branch in my clone: |
@Links2004, I did as you asked and had a code send several small texts to an open TCP client. This is the send-code: char outBuffer[1000];
WiFiClient tcpClient;
//[...]
strcpy_P( outBuffer, PSTR("my numerous test strings""\n") );
tcpClient->write(outBuffer, strlen(outBuffer)) And on the serial port I had a small python script add time stamps and otherwise print the output:
This is the python script: import serial
from datetime import datetime
import io
import time
print " Serial Debug Catcher"
print " ===================="
print ""
ser = serial.Serial(port="COM9:", baudrate=115200, bytesize=serial.EIGHTBITS, parity=serial.PARITY_NONE, stopbits=serial.STOPBITS_ONE, timeout=1)
ser_io = io.TextIOWrapper(io.BufferedRWPair(ser, ser, 1), newline = '\n', line_buffering = True)
while 1:
reply = ""
try:
reply = ser_io.readline().rstrip()
except:
reply = ""
if not reply: continue
try:
print " %s << %s" %(datetime.now(), reply)
except:
continue I hope you can use this. Let me know if I can be of more help... |
the log show no error in the TCP level of the ESP. |
Oh, I apologise, I thought this was clear from @drmpf 's post. Yes, the ACK is delayed under Windows: BUT: TCP should manage Window size automatically. For some reason Windows waits for more packets to arrive and only ACKs the single one transmitted after the 200ms timeout. And ESP826 does not send more than one packet, because it didn't receive an ACK for the previous one yet. Thus I see the error with the ESP code which should send more packets, so that the receiver (here the windows PC) can fill its RX buffer and ACK everything in it with just one ACK per bunch, not one ACK per packet. Removing the _send_wait delay "forces" the ESP to run into a timeout very quickly and send the next packet "no matter what". Which occasionally results in packet loss even under TCP (Web browser error ERR_CONTENT_LENGTH_MISMATCH). Thus I also increased the send-buffer so that this does not occur. Lastly, it still remains open why ESP does not send the negotiated number (TCP_WND[_SIZE]/TCP_MSS) of packets (which Windows is expecting), but only one before waiting for an ACK. If I can help track this down, please tell me how... |
the packet loss can be prevented (simple increasing buffer and hoping is no good idea). |
My suggested mods add buffering and works with existing code, the use of isSendWaiting() is optional, as is the buffering, Check out the examples included with my sample library |
Hmm, perhaps the issue really is when This is more so as other operating systems than Windows may feature a delayed ACK TCP stack functionality and ESP should cope with that. Also bear in mind that not all TCP applications require replies from the clients; HTTP downloads for instance. Therefore the ESP should send packets up to the receivers window size as advertised in the initial SYN message and each response packet. I don't see a reason why we cannot? So, if we need to notify user code of any send error, I'd suggest registering a callback for that: In case the kernel cannot transmit data in its buffer the callback function is executed with the appropriate error code. Resynching data stream is user code responsibility. Other possibility would be to register a callback which This would break Arduino functionality-style only partly and still cope with delayed ACKs on any operating system (i.m.h.o.). |
my plan is to add a option to the to allow, filling the internal buffer up, but the default option will be the behavior like it is today (we brake to many thinks if we change it). |
Yeah, changing the current behavior will break existing code which assumes On Mon, Jan 18, 2016, 19:44 Markus [email protected] wrote:
|
for full async TCP check out ESPAsyncTCP from @me-no-dev |
I understand and appreciate this. But this concerns just the behaviour of the Meaning ESP must honour the size of the recipient's RWin and keep sending its own send-buffer until:
Remember, it's a dynamic process, the recipient will advertise a diminishing space in RX buffer with its own ACK. And a dirty hack to keep existing fucntionality and cope with delayed ACKs: if message_size>1 then send two TCP packets =) |
@copterfan20001 check the async server that @Links2004 linked you with. I send two packets on response if there is content length. One with the headers and one with the data (as far as I remember). Please let us know if this is of any help and if so I'll hurry up the merge to this repo. |
@me-no-dev oooooooooooh, very nice job. I'll check it out this evening (in a few hours) and let you know :) But from the looks this is exactly what I've been looking for. Is it possible at all to integrate into the Arduino lib without breaking current functionality? |
just put the downloaded libs in your libraries folder and they will work. As for compatibility with the current server, I hope to be able to make it as much the same as I can without breaking anything, but we will keep both :) there are too many examples and whatnot that run on it. |
so, at home now. This may sound like a total n00b question, but how exactly would I go about using the files? I copied "as-is" to library folder, included the *.h without error, and would like to register my first onClient callback....... How do I do that? Thanks. |
Are you talking about the AsyncWebServer? it has an example that shows 2 main features. |
Nah, wanted to try simple TCP server first, so went to https://github.com/me-no-dev/ESPAsyncTCP and downloaded everything and put that into the library folder. Then I included the *.h, work fine. But now how would I register the onClinet callback, actually get the new client object within that callback and register the onReceive function for that client? Would you have a small telnet-server like example? |
oooooooooooh, marvellous! I particularly like your (imo) clean coding style. How can I get you a beer???? 2x thumbs up from my side, couldn't test the error function though, the ACKs were sent cleanly. |
Knowing that it helped and it is working is enough for me :) if it's not for you, then my paypal is [email protected] :P drop me an issue report if you find any bugs or have recomendations. You can take a look at the WebServer class for keeping and cleaning multiple clients. |
Hi, hope you've had a cold one already, needed to get paypal first :) One micro question/suggestion. With HardwareSerial and the WiFiClient object I can do something like: // "Serial" is built-in
WiFiClient tcpClient;
AsyncClient tcpAsyncClient; // Sic! no pointer.
Print *outChannel;
void setup() { /*...*/ };
void loop() {
outChannel = &Serial;
outChannel->write("Hello SERIAL"); // goes to Serial
outChannel = &tcpClient;
outChannel->write("Hello TCPCLIENT"); // goes to established WiFiClient connection
outChannel = &tcpAsyncClient; // compile error here
outChannel->write("Hello TCPASYNCCLIENT"); // should go to established ASYNC connection
} Trying the same with your client results in a "cannot convert *Print to *AsyncClient" compile error, most likely I'm doing something wrong. The I'm using this to have a universal command processor btw, i don't care if a command comes in via Serial, Telnet, HTTP, UDP, or different, I just need the buffer and its length. Before calling the command processor, I'd point the outChannel to the correct function container so the command processor output goes to the correct channel automatically. I avoid a lot of if's in the processor that way. Perhaps it would be cleaner to function-point to the respective write() function directly? Hope that's a c way of doing stuff... |
The Async Client is meant to be a bit different :) it's not exactly a Stream like what you expect in the code above. You can wrap it inside a simple class though that extends Print and implement write(byte) method that will make your print work, but beware! writing 1 byte to the network will send a packet for that byte and will expect ack before can send another byte. That is why it's not a stream :) The more you get to send at once (up to 1460 bytes), the better and more speed you will get. If you look at the example I gave you, when I'm reading the serial I actually give it a chance to get some bytes if it's empty so I can send more of the incoming message at once. You would need something like that implemented so not to send a packet with each byte :) |
as an idea, if you are intending to use it to pass commands or messages terminated by new line, in that write(byte) method, you can check and see if the byte is a new line character and send the buffered string (most terminals and shells work that way). If you think it's above your ability to write such thing, let me know and I'll wrap something for you. |
You could also just wait a little to see if more data is written, before sending a partial packet, which is what pfodESP2866WiFi does when you uses its buffering |
here is an example if a printing Async Client |
and here is another go that you might like better (printer is not a reference) |
@me-no-dev thanks very much again, that's a lot of effort you put in there. I apologize again for not being very clear. The stream functionality of those various classes is not so important. It's more about c++/c semantics on how to correctly "function-point". So, I'll always have a My question thus is rather: how would I go about having one single function pointer (doesn't have to be of Print class) that can point to the write functions of different classes. I've read about pointer-to-function-members, but understood that they're bound to a specific class and cannot be generic. My approach now is to have a one-liner plain-c function wrapper for each write function and have a regular c function pointer. Of course with additional context switches, so not so good. Is there a better way? uint16_t (*writeFunPtr) (const uint8_t buf*, uint16_t len);
WifiClient tcpClient; // deliberately not a pointer!
AsyncClient tcpAsyncClient; // deliberately not a pointer!
uint16_t serWriWrap(const uint8_t buf*, uint16_t len) { return Serial.write(buf, len); }
uint16_t tcpWriWrap(const uint8_t buf*, uint16_t len) { return tcpClient.write(buf, len); }
uint16_t asyncWriWrap(const uint8_t buf*, uint16_t len) { return tcpAsyncClient.write(buf, len); }
void setup() { /*...*/ }
void loop() {
writeFunPtr=&serWriWrap; writeFunPtr("SER", 4); // -->serial out, streamed
writeFunPtr=&tcpWriWrap; writeFunPtr("TCP", 4); // -->tcp out, 1 packet
writeFunPtr=&asyncWriWrap; writeFunPtr("ASYNC", 6); // -->async out, 1 packet
} But other than that I think this can be closed if more people tested you code and @igrr puts it into master (preferably with the examples ;) ). |
@nouser2013 see second and third recipe here: http://www.esp8266.com/viewtopic.php?p=39201#p39201 |
Oh boy, that's perfect. Thank you very much, I'll try it as soon as I'm home. me<=happy :) |
My apologies for opening this old thread but it seems to exactly reflect the issues i'm having with client.write. I've used the "me-no-dev" library and the code below, and that decreases the file transfer from 50 seconds to 20 seconds The file is only 150KB - why does it take an age to upload to the server? The same code on my PC takes no more than 2 seconds (including the server response). I know the ESP is not a PC - but from a networking perspective, where is the ESP bottleneck? Is there anything I can do to speed up the upload? My project is effectively dead before it's started if this is the best upload speed I can achieve :-(
|
You may try increasing buffer size to |
@igrr isn't that hardcoded in the *.a from expressif? his Also, what OS is the remote server running? |
I'm not suggesting to change MTU size (which, btw, it's not hardcoded in an .a file, we are using open-source lwip now, there is "build lwip from source" option in boards menu). |
Perfect. Didn't know that this is possible now. Assuming it's thread-safe :) |
Thanks @igrr and @nouser2013 for your very quick replies! @igrr - so my options appear limited? Is the MTU size a HW limitation? I had seen many posts online regarding the bandwidth of the ESP - such as this one - #1853 (comment) - so I assume these speeds are realised because the client and server have low latency (maybe on the same network) and the client gets very fast ACKs? I had already discussed this issue with @martinayotte, as I had tried the code you suggested in your second post ( @nouser2013 I orginally had by |
@igrr is there any news on upgrading lwip to a more current version? |
@andig when I have some news on this, I will reply to you on the relevant ticket :) For now updating LwIP for the 8266 non-OS SDK is in backlog. Re throughput: window size is not a HW limitation, it's just the way LwIP is configured. You may try tweaking the windoow size but in this case you will probably have to reduce the number of TCP PCBs to avoid running out of RAM. Also you may need to change other LwIP options like the number of segments in flight. |
LwIP version update is almost entirely unrelated to the issue you are describing. |
ok, thanks. So just to confirm, my query above starting "my options appear limited" queried how other users had reported better throughput. Is that just because they had lower network latencies? |
@mph070770 , I've done a test with the code I've already provided to you : I've dump a 180K file over TCP Telnet, and it took less than 4 secs. Make sure that you have a call to client.setNoDelay(1); right at the beginning of new connection. |
I would suppose so. Also SPIFFS has some read latency, although I heard this was improved in recent versions. You may add some benchmarking to the code to see how much does SPIFFS contribute to overall latency. |
I'll do some investigations. The server is remote (it's a 3rd party server that I have no control of) and I don't (yet) know the performance of it so I'll check. |
@mph070770 , you are confusing us : you were mentioning bad performance of SPIFFS file dump over TCP, so how this involve any remote server ? |
Am I? Was I? I've shown my code above that takes a file from SPI memory using SPIFFS and tries to upload it to a server using client.write. My apologies if I've confused anyone. I think @igrr was clear?? |
Ok ! But you don't wish to have this server in your testing, because it can be the source of the problem. |
Very true - I haven't ruled out other factors in my code or the server. I think @igrr has hit the nail on the head and it's probably down to the TCP window being small on the ESP, and the latency associated with each .write call. I'll do some tests and work out the delays associated with the server. However, delays or not, the server is what it is and works with other platforms. The "problem" is with the ESP - either that it's not capable of doing what I'm asking or, more likely, that the code needs to be refined? Either way, it doesn't look like there's a fix on the horizon, which is a shame, but i'll keep my fingers crossed...! |
As I said several times, I got 180K within 4 secs. Isn't that enough or do you need more ? |
@martinayotte sorry for interrupt but can you please share the code that do such task? my best |
Nothing special, but here is the piece of code dumping the SPIFFS :
|
@martinayotte I haven't yet managed to time the server response (it's been one of those weeks!!) but I have found an alternative server that provides a similar service and it's dramatically improved my response times. I still think the underlying issue is that the latency between .write packets is long which, when combined with a small TCP window, results in a slower performance writing to a laggy server compared to a platform which handles larger TCP windows (and therefore has to make less writes). Hopefully, because I've found a different server, my problem may be resolved... for now! I do appreciate the time and effort you and @igrr put into helping me. It really helps when trying to climb up the ESP learning curve...! |
Hi, I'm sorry to bring this up once more as it has been referenced several times now. But still, I cannot get a TCP server to put more than one packet in the send queue due to incorrect handling of the delayed ACK. Looking at the headers, there is the TCP_MSS set to 1460 and the TCP_WND four times that value. Hence, if the same thing is in the liblwip.a (and negotiation works correctly), we shouldn't have a problem.
What bothers me in particular is the following video https://www.youtube.com/watch?v=8ISbmQTbjDI where this guy sends websocket replies back to the client at about 220Hz. And they carry arrays of data. To be fair, he coded this with the FreeRTOS version without arduino intermediary layer.
BUT: why can't we have this?
this results in
*edit: MWE added..
The text was updated successfully, but these errors were encountered: