-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Possible bug on WiFi code, related to power usage #5825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, I'm also an user of this core and I also see this difference in power, using or not the delay(1). I don't think this is a problem because when you use delay the cpu sleeps (not the wifi) so the amount of power decreases a lot. If you use the sleep in wifi you will see a decrease on power usage too, so, I use both approaches to lower a lot the power usage. Wifi modem sleep and delay(x). If you want, you can check the code that we use in Tasmota software (https://github.com/arendst/Sonoff-Tasmota) with an approach of dynamic sleep for cpu (Andre Thomas' idea - @andrethomas) |
Hi, ascillato Thats the part i cant get my head around, if the CPU sleeps for 1 ms and works for 62 ms every cycle (on the example sketch) we should see a proportional decrease of 1/63 in power usage every power cycle, right? I let it measuring through the last night and got 28 mA/h, basically half the power for 16 ms of sleep and 984 ms of work every second, this is not adding up. I will definitely check the Tasmota codes, and also test CPU power usage with WiFi completely off, using #2111 recommendations, just to be sure. |
Just don't confuse CPU load with the indicated loadavg which is a representation of how much time the main loop spends in callbacks measured against the target loop setting (dynamic sleep value) |
Here's a crude example of how a dynamic target loop delay may be used to emulate timed function callbacks
|
@andrethomas please don't use that approach, it doesn't work correctly on millis() rollover. |
@JonasGMorsch what happens if instead of delay(1) you
|
@devyte I did say "crude" ;) |
All results have virtually the same CPU work. Looking inside: /cores/esp8266/core_esp8266_wiring.cpp
Looks like it has something to do with this os_timer functions, i will try to comment out one each time and see what happens |
@ascillato, @devyte
And the current drops to 10mA/h or lower!, but if i comment out the delay(1), you guessed it : 50 mA/h I think the WiFi library and os_timer are related some way, some WiFi functions only complete after some os_timer interactions. If is something inside de SDK itself then i really gonna need help, but even if we cant do anything there, just to put the os_timer on the needed places will save a lot of power in a lot of projects. |
The other interesting thing to note is that without adding delay() somewhere not only does the current demand increase but also the heat generated by the ESP82xx You do not easily notice this on a shielded nodemcu that has good thermal dissipation essentially using the pc board as a heat sink but when you try it on a wemos d1 mini v1 which does not have enough thermal conductivity to get rid of the heat generated by the ESP82xx it usually pretty much gets too hot and goes into some or the other mode where it starts dropping wifi connections which is eventually followed by a reset. |
@andrethomas, you are right! I didn't notice it before, the diference in temperature is so great that after 5 minutes running i can actually compare just with my finger, especially from 10 to 50 mA I'm now trying to figure how much delay(1)'s per millisecond i need to keep this under control, and hope that will give-me some hint of where is the problem, since mess with the original delay basically breaks everything. |
@JonasGMorsch I've been called worse ;) I don't know the origin of the problem - What I can tell you run fairly complex firmware such as Sonoff-Tasmota you need to give at least a few milliseconds per main loop to keep the wifi etc alive on boards which have bad thermal dissipation. As a default, we're currently using 50ms as a main loop target delay but this is largely informed by the fastest callback we have being 50ms apart from each other rather than a scientific or mathematical equation. We don't use the PolledTimeout class suggested by @devyte because we still actively support the 2.3.0 core as users get mixed results with different wifi configurations across all three cores we support (2.3.0, 2.4.2 and 2.5.0) and the method we use seems to work reliably enough but also that the mentioned class is only introduced after 2.3.0 so its practical and easier to support in its current form - Some cores seem to work better for some wifi configurations such as mesh etc so that adds a bit of flexibility for the user base. It is somewhat alarming when you initially stumble upon the heat dissipation requirements of the ESP82xx but its clear that there is a relation between the amount of actual processing compared to the amount of time given to yield (delay(x)) which is necessary to keep the Arduino Core and perhaps the Tensilica processor happy from a wifi connectivity perspective. The source code for the underlying SDK is not publically available so a lot of guesswork is done and we cannot change the fundamental behaviour of the SDK so who knows what it's doing while we yield (or delay) I don't expect this to be a linear relationship because that would imply that the Tensilica processor would require 0mA to not do anything at all which we know is only possible to a degree with deep sleep using a timer or external interrupt to wake the processor up from sleep. My best guess is that if you do not use main loop delays the main loop will simply run as fast as it can and effectively cause load on the Tensilica processor resulting in an increased demand for current and resulting in an increased amount of heat which needs to be dissipated. This is most likely by design which is why the ESP82xx chips have a nice thermal pad underneath it which should be ground planed in such a way to ensure good heat dissipation. Further observation suggests that the above-mentioned symptoms exist in all the cores I have tested (2.3.0, 2.4.2 and 2.5.0) but it does appear to be a little more pronounced on the 2.5.0 core release. My best guess is that its not because of a bug in the Arduino Core but rather just the additional code added (the 2.5.0 core based binaries are larger) that is causing additional CPU cycles taking away some of the time you'd normally spend in yield (or yield for a delay()'d period of time) so probably there needs to be some compensation for this but without definitive insight into the underlying SDK there's no way to know for sure. I would say it is just as likely that it is not the Arduino Core or the SDK but rather just the way the Tensilica was designed to operate. I hope my observations can help you albeit not very scientific due to a lack of access to information on the inner workings of the SDK itself... so for now we just work around it as much as possible. |
@andrethomas Thanks for the explanation. So I can't hardly wait to do some more tests tomorrow :) |
OK, so I've done a lot of tests and I still don't know what's happening exactly, but at least I have some better idea of the symptoms I am seeing. For ESPeasy I added a number of settings to toggle and see what's happening (related PR):
The last option, what I've labelled "Eco mode" is perhaps the most on-topic one of this issue. So this is great and imagine the power savings when 100'000'000 of these devices are running in Eco mode... As soon as the node is running in this so called "Eco mode", then it is missing packets. A node which is running in low-power mode according to the power meter: The same node while using 0.45 Watt: This is regardless of switching the WiFi radio to 'always on' (calling Also good to notice is that switching on the 'Eco mode' does not yield a lower power consumption immediately. It may sometimes take even 10 minutes or more before it will be noticeable and it may be temporarily seem to be turned off if you look at the power consumption. When running a ping from an external host to the ESP, the power consumption will remain at 0.45 Watt and thus the reported nodes in the ESPeasy network all have an "Age" of 0 or 1 minutes (discovery packets are sent every minute) N.B. The delay spent in the scheduler for 'idle loops' is still like normal, regardless of the power used. So it is something in the core libraries (or SDK) which is changing something to have this effect. To summarize:
I hope this may help in finding the root cause of all these WiFi issues which lead to hardware watchdog resets and maybe also some resets due to "Exception". |
Interesting... question - which LwIP variant are you using? |
@andrethomas ESP82xx Core 2.6.0-dev, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.1.2 PUYA support
It is on a node without Puya chip, so that shouldn't make the difference :) |
I'm not sure if that's low memory or high memory - what MTU is used? |
It's low memory "no feature" (mss 536) (mtu is always 1460) |
Ah, I meant MSS - Could this be the cause of some of the packets being dropped as observed by @TD-er ? |
I don't think so. Packets are not dropped with small MSS. Data are sliced into smaller packet/payload (less memory but more packets ->more "ack" latencies -> less bandwidth). |
The data packets which are missed by my nodes are quite small. (about 41 bytes of payload) |
Closing based on previous comment. If you think the problem is still valid in latest git, please open a new issue and fill out the required info. |
Basic Infos
Platform
Settings in IDE
Problem Description
Hi Everyone,
I guess that i found a bug on the wifi related to power usage.
After few hours of testing multiple NodeMCU's i noticed that every code that not uses wifi has around 50mA/h power usage.
But to my surprise, one of my MQTT codes with WIFI_NONE_SLEEP hovers between 20 to 40mA/h, while the others had the same 50mA/h behavior.
The only difference was a delay(1); in the main loop, and yes, it was it.
First i thought it had something to do with the delay lowering the loop count, but then i came up with this test code:
MCVE Sketch
With or without delay(1); the loop count is around 16/s, which is pretty low, and yet the same problem appears, 20-40mA/h with delay, 50mA/h without it.
delayMicroseconds and yield make no difference in this case.
If someone has a good ampere meter (mine isn't) and some free time i would appreciate some help.
And also, if i'm doing something wrong let me know.
Debug Messages
The text was updated successfully, but these errors were encountered: