Skip to content

[BUG]WEB Server reset module randomly (Watchdog Reset with Watchdog disabled) #428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
luc-github opened this issue Jun 15, 2015 · 136 comments
Closed

Comments

@luc-github
Copy link
Contributor

Hi I am using staging module and I compile using 1.6.4 under windows 7 64bit
I use ESP8266 01 to be a bridge between 3D printer and a web browser to get printer information, for this I do a request every 2 seconds to refresh the content of the web page and this make the module to reset randomly.

it happens even watchdog is disabled ( between 10min to 8h)
I open 2 web pages at once to make it happen faster
if no pages open the module do not reset at least within 48h ( I did not tested longer)
so currently can be String issue (but did not saw any memory leak) or Webserver issue but I do not know what to test

Is it a known issue ?
I there anything wrong in the code ?
or anything that should be added to prevent module reset ?

Thanks in advance

I narrow down the minimal code using helloserver.ino to reproduce the problem (IP is harcoded to limit the usage of any function)

#include <ESP8266WiFi.h>
#include <WiFiClient.h>
#include <ESP8266WebServer.h>

const char* ssid = "dlink";
const char* password = "****";//I remove my password

ESP8266WebServer server(80);

void handleRoot() {
  String message = "<HTML><BODY><H1>Page static</H1>";
  message += "<IFRAME  width=\"200\" height=\"200\"  NAME=\"frmstatus\"  ID=\"frmstatus\"  SRC=\"http://192.168.0.115/STATUS\"></IFRAME>\n<SCRIPT TYPE=\"text/javascript\">\n";
  message +="setInterval(function(){";
  message +="var ifrm=document.getElementById(\"frmstatus\");var doc=ifrm.contentDocument?ifrm.contentDocument:ifrm.contentWindow.document;";
  message +="doc.location.reload(true);";
  message +="},2000);\n";
  message +="</SCRIPT>\n</BODY></HTML>";
  server.send(200, "text/html", message);

}

void handleSubRoot() {
  static long cnta=0;
  static long cntb=0;
  cnta++;
  if(cnta>100000000)
  {
      cntb++;
      cnta=0;
  }
  String message = "<HTML>\n<BODY>\n<H1>refreshed page</H1>";
  message += String(cntb) + "/" + String(cnta);
  message +="</BODY>\n</HTML>\n";
  server.send(200, "text/html", message);

}

void setup(void){
  ESP.wdtDisable(); //enable or disable reset will happen 
  Serial.begin(115200);
  WiFi.begin(ssid, password);
  Serial.println("");

  // Wait for connection
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("");
  Serial.print("Connected to ");
  Serial.println(ssid);
  Serial.print("IP address: ");
  Serial.println(WiFi.localIP());


  server.on("/", handleRoot);
  server.on("/STATUS", handleSubRoot);

  server.begin();
  Serial.println("HTTP server started");
}

void loop(void){
  server.handleClient();
} 
@holgerlembke
Copy link
Contributor

Do you monitor your heap()?

@luc-github
Copy link
Contributor Author

not on this sketch, but on my project Yes, and was always over 20k when resetting, that is why I wrote I did not saw any memory leak

I used system_get_free_heap_size() to check at each refresh

@luc-github
Copy link
Contributor Author

if have another way to monitor - I would be more than happy to check

@holgerlembke
Copy link
Contributor

Even inside those routines?

To be sure perhaps some Serial.println() into ESP8266WebServer::send and related stuff.

When I used web server first time, holding down F5 in IE killed it instantly, see #230.

@luc-github
Copy link
Contributor Author

no I did just before server.send nothing inside the function send itself because it gave me always more than 20000 when it reset, so far from an out of memory

@luc-github
Copy link
Contributor Author

Using the test skech I provided - heap is 28880 as a constant after send function

@luc-github
Copy link
Contributor Author

actually seems memory is fluctuating after several minutes:

28880
28880
28880
28880
28880
28864
28880
28864
28880
28864
28880
28864
28880
28880
28880
28864
28880

then before to reset:

28880
28880
28880
28880
28880
28864
28880
28880
28880
28512
28904
28864
28880
28880
28496
28904

 ets Jan  8 2013,rst cause:1, boot mode:(1,7)

I do not know if it help in some way as this is far from out of memory

what "cause:1" means ?

@holgerlembke
Copy link
Contributor

as far as I see, nothing about this causes is really documented...

25k looks more than ok. I'm out of ideas on that problem. Other than as wild guesses like "check power supply", the root of all evil.

@luc-github
Copy link
Contributor Author

it happen when connect to PC using USB-Serial adapter and when directly connected to printer on 3.3V
But if it was power supply this should happen also when web server do not display page, right ? - which is not the case , if I do not query the web server or only time to time, module does not reset

@donaldej
Copy link

You shouldn't power this module with the USB port. Its power consumption is pretty considerable when it is transmitting.

@holgerlembke
Copy link
Contributor

Who knows? I didn't hook up the power usage to an ozsi (perhaps I should). I assume that the wifi-send-stuff needs most of the energy. And whatever might happen at random moments while two stations talking together, a cat walking throu the beam and then your stuff sucking the remaining 1 mA... But again, who knows? I attach a 1A-supply, some caps and forget it.

@luc-github
Copy link
Contributor Author

Ok I have plugged my ESP01 to a 2A power supply and module is still resetting after 1h of web query

So I guess issue is software not power

@sticilface
Copy link
Contributor

Do you have AP running?

@luc-github
Copy link
Contributor Author

no it is station mode - ESP01 is connected to an AP using DHCP

@sticilface
Copy link
Contributor

I get a lot of resets, for no reason when running in AP mode. Still to get to the bottom of it...

@sticilface
Copy link
Contributor

I'm actually having new issues with the web server. it just stops responding after around 2 minutes of up time. I've debugged extensively, tried different IDEs, looked at heap... The ESP is working fine..
Serial is outputing debug time info, MQTT commands still work just fine. but the web server is not responding. might be related. but I'm not sure how to debug it further...

@holgerlembke
Copy link
Contributor

perhaps cause relates to

enum rst_reason {
DEFAULT_RST_FLAG = 0,
WDT_RST_FLAG = 1,
EXCEPTION_RST_FLAG = 2,
SOFT_RST_FLAG = 3,
DEEP_SLEEP_AWAKE_FLAG = 4
};

@luc-github
Copy link
Contributor Author

if I comment ESP.wdtDisable(); and add several ESP.wdtFeed(); I got reset after few minutes
there is no command used for DeepSleep

28904
28880
28880
28880
28880
28864
28880
28496
28904

 ets Jan  8 2013,rst cause:4, boot mode:(1,7)

wdt reset

 ets Jan  8 2013,rst cause:4, boot mode:(1,7)

wdt reset

@luc-github
Copy link
Contributor Author

If Watchdog is disabled => reset is watchdog
If Watchdog is not disabled and reset => reset is Deep Sleep awake when no DeepSleep

So if behavior is inconsistent it looks like memory management issue

@luc-github luc-github changed the title WEB Server reset module randomly [BUG]WEB Server reset module randomly (Watchdog Reset with Watchdog disabled) Jun 16, 2015
@igrr
Copy link
Member

igrr commented Jun 16, 2015

Watchdog reset/disable functions don't do anything because watchdog API has not been released by Espressif. We need to replace the WDT handling with our own code and then expose that in the API but I haven't yet started doing that.

@luc-github
Copy link
Contributor Author

Ok thanks - I was relying on Readme
_ESP.wdtEnable(), ESP.wdtDisable(), and ESP.wdtFeed() provide some control over the watchdog timer._
so it is known issue - I will wait for the fix

Thanks a lot for your great job

@luc-github
Copy link
Contributor Author

Waiting for the WD functions implementation - is there any way to disable Watchdog manually ?
So I can verify my issue is actually a WD issue

@igrr
Copy link
Member

igrr commented Jun 16, 2015

You should not need to disable watchdog unless you are running some long timing-critical operations (bitbanging serial addressable LEDs, for instance).
Watchdog resets just shows you there is a bug in the code somewhere.

@luc-github
Copy link
Contributor Author

Well so then you means there is a bug in web server code ?
I just use the helloserver.ino code to reproduce my issue => refresh page every 2s and module reset

@igrr
Copy link
Member

igrr commented Jun 16, 2015

Yes, that's very likely the case here.

@luc-github
Copy link
Contributor Author

Anything I can do to help to find the root cause ?

@igrr
Copy link
Member

igrr commented Jun 16, 2015

You can try enabling debug info uncommenting the DEBUGV definition in debug.h (that's in core directory) and adding Serial.setDebugOutput(true); to your setup method.

@luc-github
Copy link
Contributor Author

Ok doing modifications you asked I got this:
28752
:ur 2

:ur 1

WS:dis

WS:ac

:ref 1

WS:av

:ref 2

:ur 2

:rn 420

:ref 2

:wr

:sent 173

:rcla

:abort

:ww

28368
:ur 2

:ur 1

WS:dis

WS:ac

:ref 1

WS:av

:ref 2

:ur 2

:rn 420

:ref 2

:wr

:sent 173

:rcla

:abort

:ww

28776
:ur 2

:ur 1

WS:dis

WS:ac

:ref 1

WS:av

:ref 2

:ur 2

:rn 420

:ref 2

Fatal exception (9):
epc1=0x40101752, epc2=0x00000000, epc3=0x00000000, excvaddr=0xdfac51de, depc=0x00000000

ets Jan 8 2013,rst cause:1, boot mode:(1,7)

@igrr
Copy link
Member

igrr commented Jun 16, 2015

Could you please upload the HelloServer.cpp.elf file somewhere?

Alternatively, you can run
xtensa-lx106-elf-objdump -S HelloServer.cpp.elf > dump.S
and look up the region of the code around 0x40101752.

@luc-github
Copy link
Contributor Author

@igrr 8h of test without issue - I still do a round

@papexus wait for the release in staging, or challenge the simple mortal who is in you and replace your current staging module files by the git ones, normally in \hardware\esp8266com\esp8266\ the folders libraries and tools cover the current fix

@papexus
Copy link

papexus commented Sep 29, 2015

@luc-github thank you. I will try this tomorrow morning. Love you and @igrr :)

@EUA
Copy link

EUA commented Sep 29, 2015

Thank YOUUUU @igrr !!! 👍

@luc-github
Copy link
Contributor Author

Ok I cannot reproduce any issue with test sketch, but still get this one (several times) with my project which looks close to the one I got before on with test sketch :
#428 (comment)

The same project was running 8 h without issue.


Exception (28):
epc1=0x4000df60 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: cont 
sp: 3ffec670 end: 3ffeca70 offset: 01a0

>>>stack>>>
3ffec810:  0000172f 00000001 3ffec8f0 402147aa  
3ffec820:  3ffec840 00000001 3ffec8f0 40214b75  
3ffec830:  40255640 40255638 3ffec860 00000090  
3ffec840:  40255640 40255638 40255670 40214baa  
3ffec850:  40255640 40255638 40255670 40204c76  
3ffec860:  3fff4cf8 0000000f 00000001 00000001  
3ffec870:  00000001 ffffffe3 00000000 3ffedbf4  
3ffec880:  00000000 068d88e9 00002200 3ffedbe8  
3ffec890:  401000b4 3ffec940 3ffec940 00000017  
3ffec8a0:  40225d52 401000a2 3ffec940 40214776  
3ffec8b0:  3fff0e88 40223d58 00000000 00000000  
3ffec8c0:  4021dcb9 3fff57c8 3ffec940 3ffea0a0  
3ffec8d0:  7400a8c0 401000a2 3ffec940 00000000  
3ffec8e0:  3ffeaa80 3fff57c8 0000000f 00000005  
3ffec8f0:  00000000 0000172f 0000172e 3fff4c98  
3ffec900:  0000000f 0000000d 3ffec940 402147da  
3ffec910:  3fff4c88 401000a2 0001c200 000022b8  
3ffec920:  3fff4b03 3ffeaa80 3ffec9e0 40214752  
3ffec930:  00000090 0000000c 3ffec980 402147aa  
3ffec940:  3fff57c8 0000000f 3ffec980 3fff43a4  
3ffec950:  3fff4b70 00000000 3fff4370 40213166  
3ffec960:  0000000f 00000000 3fff5588 4020e7be  
3ffec970:  3fff5c08 0000000f 3fff4370 4020e47d  
3ffec980:  3fff4c98 0000000f 0000000c 0000000f  
3ffec990:  0000000c 3fff4c98 0000000f 00000003  
3ffec9a0:  3fff5ce8 0000007f 00000000 3ffeb9ac  
3ffec9b0:  00000000 00000001 40201c88 0000000f  
3ffec9c0:  4020154e 00000001 00000001 3ffeca10  
3ffec9d0:  3fffdc20 000003e5 3fff4370 4020e571  
3ffec9e0:  3ffe97a0 00000000 000003e8 0001ad79  
3ffec9f0:  3fff4388 3fff5a10 00000000 3fffdc20  
3ffeca00:  40203040 3ffeb260 3ffeca94 40203051  
3ffeca10:  3ffe97a0 00000000 000003e8 40202ff4  
3ffeca20:  00000000 00000000 00000008 00000000  
3ffeca30:  00000000 00000000 00000016 40101c21  
3ffeca40:  40201ca9 00000000 00000000 3ffeca9c  
3ffeca50:  3fffdc20 00000000 3ffeca94 40201d06  
3ffeca60:  00000000 00000000 3ffeba50 40100398  
<<<stack<<<

 ets Jan  8 2013,rst cause:1, boot mode:(3,7)

load 0x4010f000, len 1264, room 16 
tail 0
chksum 0x42
csum 0x42
~ld

any idea what lead this kind of error ? so I can narrow to a test sketch

@luc-github
Copy link
Contributor Author

Looks like memory leak - as doing nothing ( refresh is only a 3s and I do not push any button) after few may be 1 minutes the memory drop from 26K to 17K then to 10K then crash in 4 loops

I will try to narrow down

@abhishek-dixit
Copy link

Hi

Its not a memory leak. I faced the same issue but then later realised that there is a bit of latency before heap cleaning process kicks in. It starts after every 30-45 seconds and the memory is reclaimed within 1minute.

Try triggering after 15 seconds. It will give you good results.

@luc-github
Copy link
Contributor Author

well I do not think it is the case - because if it was claiming issue, memory should decrease at each loop until no more memory.

the program was running 8h without issue, without any modification, I restarted and issue happened, I restarted and the program was running without issue during several minutes with a constant value over 25k for system_get_free_heap_size()
then without any reason, in 10 seconds get memory drop - and it is random not systematic

@igrr
Copy link
Member

igrr commented Sep 29, 2015

@luc-github just to make sure i understood correctly, heap usage is constant for several minutes, and then increases rapidly within a few seconds? Is this with the button example you had posted yesterday?

@bbasil2012
Copy link

@luc-github @igrr Now(latest staging version) I also have exception(28):
Exception (28):
epc1=0x402185a3 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000004 depc=0x00000000

ctx: sys
sp: 3ffffd70 end: 3fffffb0 offset: 01a0

stack>>>
3fffff10: 3fff9210 0000003c 40218bbc 3fff7a50
3fffff20: 3fff9210 3fff43b0 00000044 3fff43b0
3fffff30: 40216eb5 3fff4560 3fff7a50 00000134
3fffff40: 00000001 000000f0 00000000 3ffee784
3fffff50: 402133b1 3fff7a50 3fff43b0 3ffee784
3fffff60: 3ffed460 3ffee798 40220c2d 3fff1940
3fffff70: 3fff43b0 00000014 402198b6 3fff7a50
3fffff80: 3fff43b0 3fffdc80 3fff4418 00000001
3fffff90: 4021a1ff 3fff7a50 00000000 3fffdcc0
3fffffa0: 40000f49 3fffdab0 3fffdab0 40000f49
<<<stack<<<

@igrr
Copy link
Member

igrr commented Sep 29, 2015

@bbasil2012 staging has not been updated with yesterday's fixes, so you are seeing old behaviour.

@luc-github
Copy link
Contributor Author

@igrr Yes - this is correct
No - this is with my project - so I am narrow down the code - I found a way to reproduce the problem but it use my project and action are not strait forward, currently let webserver running 10 min, then launch webpage in IE using ssdp icon in network, and issue happen - so working on simple sketch now to share

@igrr
Copy link
Member

igrr commented Sep 29, 2015

Okay, thanks for the effort!

@papexus
Copy link

papexus commented Sep 29, 2015

@luc-github I tried replacing the two folders "libraries" and "tools" with the one I downloaded from github but my sketch would'nt even compile :( It just hangs halfway through. So I downloaded the staging board definition and the sketch compile again but from the post by igrr above, the changes are not yet included. any idea when this will happen?

@bbasil2012
Copy link

@igrr I use patch from #428, but i still have the same Exception (28)

@igrr
Copy link
Member

igrr commented Sep 29, 2015

@bbasil2012 could you please share the sketch you use so I can troubleshoot this issue?

@bbasil2012
Copy link

@igrr My sketch is big. It is include many separated files in summary about 54kb
I can't attach it in zip archive :(

@luc-github
Copy link
Contributor Author

OK I have simple sketch to reproduce Exception(28) but not sure it is relevant
I fill the page with fake content until almost 11K , it crash immediatly before display

if fill with 272 in the loop ,no crash
if I reserve 11K for the string with 273 loops it does not crash neither

#include <ESP8266WiFi.h>
#include <ESP8266WebServer.h>
extern "C" {
#include "user_interface.h"
}

const char FILL_THE_VOID[] PROGMEM ="<H1>Fill the page with something</H1>\n";

const char* ssid = "dlink";
const char* password = "blablabla";

ESP8266WebServer server(80);

void handleRoot() {
  String IP = WiFi.localIP().toString();
  String  s;
 // s.reserve(11000);
  s = "<!DOCTYPE HTML>\r\n<html>\r\n<body>";
   s += "<IFRAME width=\"2\" height=\"2\" style=\"visibility:hidden\" ID=\"frmcmd\" NAME=\"frmcmd\" ></IFRAME>";
   s += "<IFRAME ID=\"statusfrm\" NAME=\"statusfrm\" src=\"http://"+IP+"/STATUS\"></IFRAME>";
   s += "<BUTTON TYPE=\"BUTTON\" VALUE=\"Emergency Stop\" Onclick=\"window.open('http://"+IP;
   s += "/CMD?cmd=M112','frmcmd');\">Emergency Stop</BUTTON>";
   s += "<SCRIPT TYPE=\"text/javascript\">\r\n";
   s += "setInterval(function(){";
   s += "var ifrm=document.getElementById(\"statusfrm\");var doc=ifrm.contentDocument?ifrm.contentDocument:ifrm.contentWindow.document;";
   s += "doc.location.reload(true);},3000);\r\n</SCRIPT>\r\n";
  for (int i=0;i<273;i++)
    {
    s += "<H1>Fill the page with something</H1>\n";
    }
   s += "</body></html>\n";
  server.send(200, "text/html", s);

}

void handleStatus() {

  String  s = "<!DOCTYPE HTML>\r\n<html>\r\n<body>";
   s += "Ok";
   s += "</body></html>\n";
  server.send(200, "text/html", s);
  Serial.println("refresh");
  Serial.println(system_get_free_heap_size());
}

void handleCMD() {

  Serial.println("command");

}

void setup(void){
 Serial.begin(115200);
  delay(10);
  WiFi.begin(ssid, password);

  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  server.on("/", handleRoot);
  server.on("/CMD", handleCMD);
  server.on("/STATUS", handleStatus);

  server.begin();
}

void loop(void){
  server.handleClient();

} 

@igrr
Copy link
Member

igrr commented Sep 29, 2015

When String needs more memory, it allocates new buffer, copies its contents, and then releases the old buffer. So there is a moment when you have two buffers allocated at the same time. 2*11K = 22K which is indeed close to the total free heap available. Now it would indeed be better if the system would at least print something like "not enough heap" before crashing. But I feel this is a separate issue...

@luc-github
Copy link
Contributor Author

I was not able to do a short sketch to reproduce the error I saw with my project, so I have applied the workaround I found for the String size - to use a reserve(10000) - and I am no more able to reproduce the issue = so I guess it is same problem.

about memory checking yes it is another issue but can be the root cause of several issues, i saw several usage of new without checking if successful and use pointer just after.

int8_t WiFiServer::_accept(tcp_pcb* apcb, int8_t err)
{
    DEBUGV("WS:ac\r\n");
    ClientContext* client = new ClientContext(apcb, &WiFiServer::_s_discard, this);
    _unclaimed = slist_append_tail(_unclaimed, client);
    tcp_accepted(_pcb);
    // printf("WiFiServer::_accept\r\n");
    return ERR_OK;
}

I would add some check like this one, but I do not know if return code is really used

int8_t WiFiServer::_accept(tcp_pcb* apcb, int8_t err)
{
    DEBUGV("WS:ac\r\n");
    ClientContext* client = new ClientContext(apcb, &WiFiServer::_s_discard, this);
    if (!client) return ERR_MEM;
    _unclaimed = slist_append_tail(_unclaimed, client);
    tcp_accepted(_pcb);
    // printf("WiFiServer::_accept\r\n");
    return ERR_OK;
}

@luc-github
Copy link
Contributor Author

@igrr So I think issue can be closed as exception(38) is another issue, not related to webserver itself
What do you think ?

@igrr
Copy link
Member

igrr commented Sep 29, 2015

The one I have fixed yesterday wasn't related to web server either. It was just easier to reproduce it with webserver because it happens to use Strings a lot, hence more chances hitting heap corruption issue.
I think out-of-heap condition checking deserves a separate issue.

@luc-github
Copy link
Contributor Author

agreed - thanks a lot for your great work 👍
So I close issue

@tim-eastwood
Copy link

hi guys. Is this the same issue? http://i.imgur.com/gCp89u2.png� This is my code: https://github.com/psYbR/dual-esp8266-controller/blob/master/esp8266_controller.ino

Am totally new to microcontrollers in general and learning as I go, just wanted to ask before I attempt to update my libraries with the fix from this thread. Cheers

@luc-github
Copy link
Contributor Author

I can reproduce your issue with your code with latest staging which has the fix - so issue is there

@tim-eastwood
Copy link

thanks @luc-github. Any idea how I can work around this issue for now? ie an alternate method? I need good stability for my project or it won't really work.

@luc-github
Copy link
Contributor Author

Same here .- it is over my skills - need to wait for @igrr feedback
I think you should open a new issue, as this one is closed, for better following - providing maximum information - staging version - type of hardware you use - occurence of issue, etc..
FYI : I have also this kind of error with a sample #866 but it may not be the same issue at the end - so better to open an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests