-
Notifications
You must be signed in to change notification settings - Fork 7.6k
http.GET() randomly exits with error -1 (connection refused) -> issue with function WiFiGenericClass::hostByName() [proposal of changes in library] #3722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Perhaps move the timeout to be configurable via a default parameter on the function, ie:
And in the usage you would reference it as timeout (without the default value in WiFiGeneric.cpp) that way people can override it dynamically and not have a hardcoded timeout value. |
Now timeout is "hardcoded" with 4 seconds. Wrongly hardcoded. This valued should never be lower than real timeout value used by lwip library. If it is lower sometimes we can observe strange behaviors. |
Hi asier70, For "int start_ssl_client ...", which is no member function, "IPAddress srv((uint32_t)0);" could be declared as static. I know, that this solution has a problem with overlapping requests, but this could only solved by a completely different design, which need to introduce something like a lock mechanism. |
I also thinked about static declaration, but as you said there is possibility of overlapping - improper assigning IP to domain name. Finally I decided to use timeout longer than maximal timeout generated by lwip library (7[s] for DNS1 + 7[s] for DNS2 = 14[s]). It will work fine until someone change default configuration of this library. BTW I don't know why very old version of lwip library is used. |
Yes, you are right. But the design at all is ... if you have to rely, that someone does not change the standard configuration (sic!). |
This library is for arduino, so it is for hobbists, who don't change internal config files. Professionals are able to find and fix software errors themselves :) |
Real DNS resolving timeout used by lwip library is 14[s] (7[s] for DNS1 + 7[s] for DNS2). Function WiFiGenericClass::hostByName() has timeout set to lower value (only 4[s]), so callback function may be called after this low timeout and it may overlappe stack memory used now by other function. Fixes espressif#3722
Touché :-) |
The change suggested by asier70 above did not fix my So I am now ditching the ESP32 as it is not reliable. |
Not all servers allow SSL connection that uses IP address directly. I think that domain name must be sent to verify certificate and check how secure is the connection. |
Thanks for the tip. So without reliable DNS no go. |
I've been testing this fix for the past week - I've got 12 esp32 modules running https and http calls every ~30 seconds. I was running into this same issue and it would not recover for a long time unless wifi was restarted. This behavior was consistent on all the modules. My esp8266 modules (10 of them) would not have this issue. Anyway this fix resolved all [E] dns erros i've been having. Thanks! Wish this would get merged into the main branch. Edit: I have seen some DNS errors, but the error rate is about 100x lower and wifi does not need restarting. Does it make sense that the wifi needed restarting before? |
Thank you for testing. I don't know who can merge my fix. I can't.. |
Hi |
Recently i have http.Get() error -1 too. What is strange it is working to get certificates from client server (one subdomain) bur retrieving data from another subdomain its giving error -1. I tested server with curl, postman and web browser and all seems to work fine, but esp32. |
Also having this issue. [E][WiFiGeneric.cpp:657] hostByName(): DNS Failed for myHostNameHere. But it happens all the time not randomly. |
In my case it was server problem most likely, because after some time error just disappeared. |
I get problems many times. The only way to pass over the problem for me is to prepare the job, restart the esp32 and do the http work. Then the esp is working fine until the next communication job, then I restart the micro again and so on. The DNS problem cames because the DNS search it's the first step to establish communication. I get problems also using the final Ip address. Check if it's working ok after the restart and before to initialize all the resources. |
I got the problem also many times. Installed the patch. This reduces the problem a lot. However, this issue is not completely resolved yet for me. Once in a few days the problem pops up and a reboot is required. With the patch I got next message: |
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
how about a fix in stead of just ignoring this issue until it drops off |
[STALE_CLR] This issue has been removed from the stale queue. Please ensure activity to keep it openin the future. |
Andrzej, have you received any further information on this error? You're right, the ESP32 does not recover from a long delay waiting for a DNS reply, especially when making multiple quick requests. Perhaps this error isn't noticed as much because most on this thread seems to be making requests ~30seconds intervals. I can reliably reproduce this error on a running chip by changing the DNS settings in the router, applying the change and watching the serial monitor. The first 4 or 5 DNS errors start...slow at first (approx 4s apart) and then it's a continues procession of errors (~1s apart), most likely because the WIFI_DNS_DONE_BIT bit was set as you stated. BTW, only blocked the DNS servers for about 20 seconds then reinstated them. Everything else on the network resumed as usual except the ESP32. If no fix is available I might try and halt my GET requests when DNS fails once, clear the done bit, wait 1 min and try again. I'm not even sure if this is feasible by modifying the class but something must be changed. I haven't tired your modifications yet but I think I'll first try clearing the DONE bit before the DNS lookup and seeing if it makes a difference when I 'block' the DNS servers. |
I don't have any new news. My devices work stable for a long time with this library update.
That was my first try to solve the problem, but this is only the one problematic point in this issue. |
I lost track a bit on this, but has there been a proposed PR for this fix you mention? |
Looking at the latest upcoming release, it has quite a bit of changes for provisioning wifi credentials using a BT app. I didn't see any change in the Wifigeneric class releated to this issue although I could be wrong. I'm only slightly familiar with the git process. I did make some changes and they seem to be helping recover from shorter DNS resolving delays without a restart. I added a counter in the WiFiGenericClass::hostByName function, greater than 4 retries with an extended delay in between tries only if DNS fails first and then restart if nothing resolves. I tried catching a flag in my main loop as an alternative but this only worked some of the times as others the err loop would be in a WiFiClientSecure.cpp function but eventually after many err logs would return to hostByName and then count up to a restart. It's not a fix by any means but it'll have to do for me with no manual rebooting at least. I did find these links which may be helpful and https://tools.ietf.org/html/rfc6762#section-5 (In regards to one-shot queries)
Andrzej, I've read your info on the other issues and I'm hoping someone that can fix these, will attempt at some point. Unfortunately, I don't have the skill to do so. |
compilando con arduino funciona normal |
Real DNS resolving timeout used by lwip library is 14[s] (7[s] for DNS1 + 7[s] for DNS2). Function WiFiGenericClass::hostByName() has timeout set to lower value (only 4[s]), so callback function may be called after this low timeout and it may overlappe stack memory used now by other function. Fixes #3722
Hi Andrzej, First of all, thank you so much for your efforts put into this! True professional attitude and analysis! ;) Hereby I share my experiences, maybe useful for someone: I've upgraded to arduino-esp32 1.0.5-rc2, and now finally the software doesn't crash, but the failing requests remained. My software checks for an OTA software update once in each 24 hours, so this is now performed ~3 hours after the problem starts. Luckily the update site is on another server, so hostByName() is called with another domain. Interestingly this new (and not served from cache) DNS request kicks back everything to normal. (And everything starts over on the next day: 3 hours "outage", and back to normal.) UPDATE: I have another device, not having the fw update functionality and connected to the same repeater, and after 3 hours of mess, it also continues normal operation. This device also produced the 3 hours long DNS disturbance (at least it was not completely silent, but many requests succeeded during the period). The 2 devices were not started at the same time, hence the disturbances are not at the same time, but in both cases ~21 hours after the devices are started. Glad to see that in 1.0.5-rc2 your solution is included! So thanks, again for rolling up this hostByName() fault and glad that your solution is included in 1.0.5! Greetings from Hungary! :) BR, |
This seems to be such a good improvement. I've been using PlatformIO and to get the new version of esp32 working there to stop the timeout. Add this to your platform.ini and it'll pull the newest version of ESP32.
|
@evanmiller29 you should use instead use the latest release (1.0.6). Git master will move to 2.0 and many things could stop working (that would be mostly to some changes in the API). PIO might also need some time to adjust. |
I have the same problem with 1.0.5. |
Hey, I have been experiencing the same issue with my esp 32. It connects to a https every 2 sec to get a API request and then closes the connection with the server with client.stop(). I am adding this to a function and the function is called every 2 sec. The code works fine for 10-15 minutes then the infamous error pups out and there is no-improvements thereafter. |
Rock-solid ? - the short answer is "No", disregarding that the best part of ESP's is the WiFi as a controller - it is one of the worse as a software. |
SDK? regarding the changes you suggested? I am running the latest version
of libraries as I use platform Io and I have using default method of adding
library to the project.
I need to esp because of its performance for my project but i also need
this feature working.
I am working on another part of code now, but that problem still exists.
I'll try upgrading the library as you've suggested in arduino ide and let
you know what I find and will inform you if it would work for me.
Thank you.
…On Thu, Mar 31, 2022, 1:07 AM Yordan Yanakiev ***@***.***> wrote:
Rock-solid ? - the short answer is "No", disregarding that the best part
of ESP's is the WiFi as a controller - it is one of the worse as a software.
There is a fix, which did most of it if You trace my suggestion for it.
Yet, they implemented it, it worked, but then scratched out, and then
again did other changes, which I am not sure if it is good at the moment.
So, first of all -did you try with the latest SDK ?
—
Reply to this email directly, view it on GitHub
<#3722 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT5B62SHIIE4WDZ2NY3UUILVCSSWVANCNFSM4KRO6YCQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
So, You are on 2.0.2 ? |
Also You can try the dev version of it https://github.com/espressif/arduino-esp32/releases/ |
Yes, I think so, I am on the latest available Library version.
…On Thu, Mar 31, 2022, 1:22 AM Yordan Yanakiev ***@***.***> wrote:
So, You are on 2.0.2 ?
—
Reply to this email directly, view it on GitHub
<#3722 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT5B62RO5BULHTKVGABR35DVCSUO5ANCNFSM4KRO6YCQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I am working on this part to establish a get request from ESP32 but still the same problem const char* ssid = "REPLACE_WITH_YOUR_SSID"; //Setting your serverName //Your Domain name with URL path or IP address with path // the following variables are unsigned longs because the time, measured in void setup() { WiFi.begin(ssid, password); Serial.println("Timer set to 5 seconds (timerDelay variable), it will take 5 seconds before publishing the first reading.");
//Then, the following lines of code save the HTTP response from the server. |
Can you please wrap your code in 3 back quotes? Something like this:
But then using the back-quote (left from the "1" on the keyboard) |
........ void setup() { WiFi.begin(ssid, password); Serial.println("Timer set to 5 seconds (timerDelay variable), it will take 5 seconds before publishing the first reading."); void loop() { if ((millis() - lastTime) > timerDelay) { if(WiFi.status()== WL_CONNECTED){ String serverPath = serverName + "?temperature=24.37"; http.begin(serverPath.c_str()); int httpResponseCode = http.GET(); http.end(); |
Nope, I meant these
But then without the spaces. |
void setup() { WiFi.begin(ssid, password); Serial.println("Timer set to 5 seconds (timerDelay variable), it will take 5 seconds before publishing the first reading."); void loop() { if ((millis() - lastTime) > timerDelay) { if(WiFi.status()== WL_CONNECTED){ String serverPath = serverName + "?temperature=24.37"; http.begin(serverPath.c_str()); int httpResponseCode = http.GET(); http.end(); Is that ok right now? |
Nope, but anyway let's not clutter the thread with multiple attempts (as you can edit your posts too)
#include <WiFi.h>
#include <HTTPClient.h>
const char * ssid = "REPLACE_WITH_YOUR_SSID";
const char * password = "REPLACE_WITH_YOUR_PASSWORD";
String serverName = "http://127.0.0.1:1880/update-sensor";
unsigned long lastTime = 0;
unsigned long timerDelay = 5000;
void setup() {
Serial.begin(115200);
WiFi.begin(ssid, password);
Serial.println("Connecting");
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.print("Connected to WiFi network with IP Address: ");
Serial.println(WiFi.localIP());
Serial.println("Timer set to 5 seconds (timerDelay variable), it will take 5 seconds before publishing the first reading.");
}
void loop() {
if ((millis() - lastTime) > timerDelay) {
if (WiFi.status() == WL_CONNECTED) {
HTTPClient http;
String serverPath = serverName + "?temperature=24.37";
http.begin(serverPath.c_str());
int httpResponseCode = http.GET();
if (httpResponseCode > 0) {
Serial.print("HTTP Response code: ");
Serial.println(httpResponseCode);
String payload = http.getString();
Serial.println(payload);
} else {
Serial.print("Error code: ");
Serial.println(httpResponseCode);
}
http.end();
} else {
Serial.println("WiFi Disconnected");
}
lastTime = millis();
}
} If you look at the text in this post, when quoting it, you will see what I meant. |
I think I found the issue in your code: String serverName = "http://127.0.0.1:1880/update-sensor"; The ESP will try to access itself. (127.0.0.1 is localhost) |
the clie
I want to configure the ESP32 as a client and the node red application running locally on my PC as server that's why i put the server name like in the code. ESP32 is connected to wifi with IP adress 192.168.1.17 |
But still, 127.0.0.1 is always (!!!) localhost. If you try to access another host, you must use the IP of that machine and also make sure the service running on that port is accessible from another host in your network. |
Many thanks :) |
I just resolved this issue, I know it is not proper solution for this but workable for small project. String hostnameIP "IPAddress" //Which_You_Find_from_ https://whatismyipaddress.com/hostname-ip entering hotsname(e.g. google.com/// IP = 142.250.68.78) void setup() // your other entire code.......................... IPAddress ip; Hope I bring usefull solution. |
Like others, I am still seeing my ESP32 work for awhile, but then start failing with My code checks an https url once an hour. Within about a day, it almost always gets into a state where almost none of the http requests are working anymore. Some of them work, but most of them do not, all to the same server.
I'm using esp32 2.0.13 for arduino. On 3.1.0-RC1 the error is somewhat different:
which seems related to #3686 Code is:
|
@emmby how about 3.0.x versions? On 3.1 you got disconnected from the WiFi, so it is normal for the request to fail. Maybe something is keeping your ESP so busy that it can not handle the WiFi on time (thus you got BEACON_TIMEOUT) |
Hardware:
Board: ESP32 Dev Module
Core Installation version: 1.0.4
IDE name: Arduino IDE
Description:
Because issue #2778 Getting
[E][WiFiGeneric.cpp:658] hostByName(): DNS Failed
randomly is closed and this problem still exists I decided to open new one.From time to time function http.GET() immediatelly exits with error -1 which means "connection refused". This problem was observed by many other programmers, especially when internet connection was slow or function was used very often.
I investigated this problem for very long time. In this situation "connection refused" doesn't mean that remote server refused our connection. Problematic was function WiFiGenericClass::hostByName(), which resolves host name and translates it to IP address.
Analysis:
Tree of functions use is as follows:
After investigation error is caused by WiFiGenericClass::hostByName() function:
Real problem lies in timeout value used in above function. This function uses DNS functionality of lwip library. How it works? When domain name is in local cache lwip library returns immediately resolved IP address from cache. If not it starts working in backgroud checking IP address in external DNS servers. Function waits only 4000[ms] for this. If resolving procedure lasts more time than 4[s] function returns with error, but background process still lives. When it finishes, callback function is called, WIFI_DNS_DONE_BIT is set and result variable is filled with resolved IP address. This situation is very dangerous for whole application! There are two big problems connected with it.
When next time we use hostByName() function it returns immediately with error because WIFI_DNS_DONE_BIT is set (look at wait for status bit function). Background callback remains alive so abnormal situation will be repeated over and over again (till DNS cache use).
Second problem is more dangerous in my opinion. Object srv of IPAddress class in fuction WiFiClient::connect() is declared locally on stack. It is passed by reference to callback function. When callback ends it writes resolved IP address to this stack memory area, but after 4[s] this area is probably used by other function so it may result in application crash...
Solution
I tried to find timeout value of DNS resolving procedure in lwip library. I found that internal timer has default 1[s] period and there are 4 retries. So deducing external DNS IP address resolving procecedure should have 4 seconds timeout. But nothing could be more wrong.
I did a test. I manually set both DNS serwers as some IPs that aren't DNS servers. I measured that real timeout occurs after 13-14 seconds! So timeout value used in hostByName() function should be more than 14000[ms].
I thought about the solution for a long time.
I suggest to make three changes in WiFiGenericClass::hostByName() function:
What do you think about this solution?
Can anyone verify my logical reasoning about it?
The text was updated successfully, but these errors were encountered: