Skip to content

why does small malloc fail when freeheap =76k+? #5346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
philbowles opened this issue Jul 1, 2021 · 14 comments
Closed

why does small malloc fail when freeheap =76k+? #5346

philbowles opened this issue Jul 1, 2021 · 14 comments
Assignees
Labels

Comments

@philbowles
Copy link

Hardware:

Core Installation version: 1.6.0
Computer OS: ?Windows 10? ?Mac OSX? ?Ubuntu?

image

Description:

I have an app specifically designed to find the "safe" lower heap limit which will allow any app to continue to run. Above 80k , it works exactly as designed, forcing the "heap grabbing" routine to back off once the lower limit is exceeded, and allowing it to continue as the heap is freed and returns back above a (higher) safe value.

The UI

image

The TL;DR is that Q is std::queue of asynchronous SSE writes which update the UI in "real time" (or at least as fast as the TCP TX rate will allow - currently can't seem to get past 20/s) once the TX rate is exceeded they get Q'd and the matching TCP ACK then pops them off the Q, freeing some heap, allowing more sends...rinse and repeat. The "Visualiser rate" is the rate per second that the Visualiser "LED" is flashed. Each "flash" is an SSE event and there is a rotary encoder hooked up to the Visualiser rate so I can just turn up / down the rate at which the Q fills / drains, and so by adjusting that and the cut-in and cut-out values I can force or avoid a low heap crash. The "Go" button (and the center button of the encoder on GPIO32) stop and start the visualiser so I can gauge the rate of Q fill/drain. The aim of the app is to find the exact lowest-safe-heap breaking point so that "production" code can be tuned to avoid it by judicious choice of of cut-out and cut-in values.

It all works wonderfully until it gets below about 80k, then I'm getting some strange results I don't understand which seems "plain wrong"

Crash Log

On a run with a lower bound "Heap Cut-out" of 50000 the following crash occured:
H= free heap, MB=Max free heap block

00:21:53.684 -> AARD:1: <---- AK 39 bytes Q=1334 H=77268 MB=67952 LOK=0
00:21:53.730 -> AARD:1: <---- AK 39 bytes Q=1337 H=77016 MB=67952 LOK=0
00:21:53.777 -> AARD:1: <---- AK 39 bytes Q=1340 H=76760 MB=67952 LOK=0
00:21:53.871 -> AARD:1: <---- AK 39 bytes Q=1346 H=76224 MB=67952 LOK=0
00:21:53.916 -> abort() was called at PC 0x40169a3f on core 1
00:21:53.916 -> 
00:21:53.916 -> ELF file SHA256: 0000000000000000
00:21:53.916 -> 
00:21:53.916 -> Backtrace: 0x400887c4:0x3ffb1d90 0x40088a41:0x3ffb1db0 0x40169a3f:0x3ffb1dd0 0x40169a86:0x3ffb1df0 0x4016915f:0x3ffb1e10 0x40169496:0x3ffb1e30 0x401691fd:0x3ffb1e50 0x400e5733:0x3ffb1e70 0x400e7946:0x3ffb1eb0 0x400db636:0x3ffb1ed0 0x400e01be:0x3ffb1ef0 0x400e0ee7:0x3ffb1f10 0x400e0f8a:0x3ffb1f30 0x400e0fa6:0x3ffb1f50 0x400e0ffe:0x3ffb1f90 0x400f0eed:0x3ffb1fb0 0x40089a52:0x3ffb1fd0
00:21:53.963 -> 

The failing line is a malloc(1436) call in WiFiUDP, but previous examples have been in my own code, ArduionOTA, MDNS...however in every instance the abort is called on a malloc requesting some small value well below what appears to be available.

Stack Trace


Decoding stack results
0x400887c4: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 156
0x40088a41: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 171
0x40169a3f: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 47
0x40169a86: std::terminate() at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 57
0x4016915f: __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_throw.cc line 87
0x40169496: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_op.cc line 54
0x401691fd: operator new[](unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_opv.cc line 32
0x400e5733: WiFiUDP::parsePacket() at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\libraries\WiFi\src\WiFiUdp.cpp line 210
0x400e7946: ArduinoOTAClass::handle() at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\libraries\ArduinoOTA\src\ArduinoOTA.cpp line 379
0x400db636: std::_Function_handler   >::_M_invoke(const std::_Any_data &) at C:\Users\phil\Documents\Arduino\libraries\h4plugins\src\H4P_WiFi.cpp line 189
0x400e01be: std::function ::operator()() const at c:\users\phil\appdata\local\arduino15\packages\esp32\tools\xtensa-esp32-elf-gcc\1.22.0-97-gc752ad5-5.2.0\xtensa-esp32-elf\include\c++\5.2.0/functional line 2271
0x400e0ee7: task::operator()() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 154
0x400e0f8a: H4::next() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 270
0x400e0fa6: H4::loop() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 358
0x400e0ffe: loop() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 315
0x400f0eed: loopTask(void*) at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\cores\esp32\main.cpp line 23
0x40089a52: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143

Questions

  1. Why does malloc(1436) fail when ESP.getFreeHeap() shows 76224 and ESP.getMaxAllocHeap() shows 67952? Most of the fails in my own code are of only 30 or 40 bytes, but still the failing line is always a malloc while apparently huge amounts of free heap are available
  2. I can hear someone saying "heap corruption rather than exhaustion" but why then when the lower limit is in the 80s or 90s does it run all day without missing a beat? What possible things could be happening in the background that only screw-up below 77k ? Why 77k?
  3. If the ESP heap calls are not the best predictor of impending heap failure, what other metric is? What is the safest and most reliable way of knowing when you are running dangerously low on memory?
@igrr
Copy link
Member

igrr commented Jul 2, 2021

Just to make sure the heap is not too fragmented, can you try calling heap_caps_get_largest_free_block(MALLOC_CAP_DEFAULT) and printing its output as well?

@philbowles
Copy link
Author

I had tried that before, tried that it: gave exactly the same value as ESP.maxHeapBlock. but I guess I didn't get as far with the code at that time.

In the early stages, I see it is indeed the same for a long time, but starts to frgament a little around FH=~185k. AT that point it gets worse and worse fast, but interestingly, while ESP.maxHeapBlock also starts fragmenting at the same point, after a while the two values start to diverge, with ESP.maxHeapBlock showing a much higher figure:

12:32:05.708 -> AARD:1: <---- AK 39 bytes Q=324 H=188008 MB=113792 LOK=0 hcglfb=113792
12:32:05.800 -> AARD:1: <---- AK 39 bytes Q=328 H=187652 MB=113792 LOK=0 hcglfb=113792
12:32:05.846 -> AARD:1: <---- AK 39 bytes Q=332 H=187352 MB=113792 LOK=0 hcglfb=113792
12:32:05.893 -> AARD:1: <---- AK 39 bytes Q=336 H=187016 MB=113792 LOK=0 hcglfb=113792
12:32:05.986 -> AARD:1: <---- AK 39 bytes Q=343 H=186448 MB=113792 LOK=0 hcglfb=113792
12:32:06.081 -> AARD:1: <---- AK 39 bytes Q=349 H=185448 MB=113792 LOK=0 hcglfb=113792
12:32:06.128 -> AARD:1: <---- AK 39 bytes Q=353 H=185132 MB=113688 LOK=0 hcglfb=113688
12:32:06.175 -> AARD:1: <---- AK 39 bytes Q=357 H=184800 MB=113332 LOK=0 hcglfb=113332
12:32:06.269 -> AARD:1: <---- AK 39 bytes Q=361 H=184472 MB=112988 LOK=0 hcglfb=112988
12:32:06.316 -> AARD:1: <---- AK 39 bytes Q=365 H=183620 MB=112152 LOK=0 hcglfb=112152
12:32:06.362 -> AARD:1: <---- AK 39 bytes Q=369 H=183284 MB=112152 LOK=0 hcglfb=112152
12:32:06.457 -> AARD:1: <---- AK 39 bytes Q=373 H=182908 MB=112152 LOK=0 hcglfb=112152
12:32:06.503 -> AARD:1: <---- AK 39 bytes Q=377 H=182572 MB=112152 LOK=0 hcglfb=112152
12:32:06.549 -> AARD:1: <---- AK 39 bytes Q=381 H=182248 MB=112152 LOK=0 hcglfb=112152
12:32:06.643 -> AARD:1: <---- AK 39 bytes Q=385 H=181428 MB=109808 LOK=0 hcglfb=109808
12:32:06.690 -> AARD:1: <---- AK 39 bytes Q=389 H=181084 MB=109480 LOK=0 hcglfb=109480
12:32:06.736 -> AARD:1: <---- AK 39 bytes Q=393 H=180756 MB=109092 LOK=0 hcglfb=109092
12:32:06.829 -> AARD:1: <---- AK 39 bytes Q=397 H=180432 MB=108772 LOK=0 hcglfb=108772
12:32:06.875 -> AARD:1: <---- AK 39 bytes Q=402 H=180012 MB=108772 LOK=0 hcglfb=108772
12:32:06.922 -> AARD:1: <---- AK 39 bytes Q=405 H=178924 MB=108772 LOK=0 hcglfb=108772
12:32:07.016 -> AARD:1: <---- AK 39 bytes Q=413 H=176900 MB=103856 LOK=0 hcglfb=103856
12:32:07.063 -> AARD:1: <---- AK 39 bytes Q=417 H=176608 MB=103856 LOK=0 hcglfb=103856
12:32:07.110 -> AARD:1: <---- AK 39 bytes Q=421 H=175936 MB=103856 LOK=0 hcglfb=103856
12:32:07.203 -> AARD:1: <---- AK 39 bytes Q=426 H=175864 MB=103316 LOK=0 hcglfb=103316
12:32:07.250 -> AARD:1: <---- AK 39 bytes Q=430 H=175512 MB=103316 LOK=0 hcglfb=103316
12:32:07.297 -> AARD:1: <---- AK 39 bytes Q=434 H=175204 MB=103316 LOK=0 hcglfb=103316
12:32:07.390 -> AARD:1: <---- AK 39 bytes Q=438 H=174872 MB=103084 LOK=0 hcglfb=103084
12:32:07.438 -> AARD:1: <---- AK 39 bytes Q=442 H=174508 MB=102940 LOK=0 hcglfb=102940
12:32:07.484 -> AARD:1: <---- AK 39 bytes Q=446 H=173660 MB=101884 LOK=0 hcglfb=101884
12:32:07.577 -> AARD:1: <---- AK 39 bytes Q=450 H=173092 MB=101884 LOK=0 hcglfb=101884
12:32:07.622 -> AARD:1: <---- AK 39 bytes Q=454 H=173032 MB=101884 LOK=0 hcglfb=101884
12:32:07.669 -> AARD:1: <---- AK 39 bytes Q=458 H=172692 MB=101884 LOK=0 hcglfb=101884
12:32:07.762 -> AARD:1: <---- AK 39 bytes Q=462 H=172320 MB=101884 LOK=0 hcglfb=101884
12:32:07.810 -> AARD:1: <---- AK 39 bytes Q=466 H=171464 MB=99640 LOK=0 hcglfb=99640
12:32:07.856 -> AARD:1: <---- AK 39 bytes Q=470 H=171164 MB=99640 LOK=0 hcglfb=99640
12:32:07.948 -> AARD:1: <---- AK 39 bytes Q=474 H=170796 MB=99640 LOK=0 hcglfb=99640
12:32:07.995 -> AARD:1: <---- AK 39 bytes Q=482 H=170136 MB=97936 LOK=0 hcglfb=97936
12:32:08.041 -> AARD:1: <---- AK 39 bytes Q=485 H=169580 MB=97660 LOK=0 hcglfb=97660

...

12:32:12.307 -> AARD:1: <---- AK 39 bytes Q=767 H=140904 MB=69500 LOK=0 hcglfb=69500
12:32:12.399 -> AARD:1: <---- AK 39 bytes Q=770 H=140164 MB=67952 LOK=0 hcglfb=67344
12:32:12.445 -> AARD:1: <---- AK 39 bytes Q=774 H=139796 MB=67952 LOK=0 hcglfb=67344
12:32:12.493 -> AARD:1: <---- AK 39 bytes Q=777 H=139232 MB=67952 LOK=0 hcglfb=67344
12:32:12.587 -> AARD:1: <---- AK 39 bytes Q=781 H=139240 MB=67952 LOK=0 hcglfb=67344
12:32:12.633 -> AARD:1: <---- AK 39 bytes Q=784 H=138716 MB=67952 LOK=0 hcglfb=67296
12:32:12.680 -> AARD:1: <---- AK 39 bytes Q=788 H=138128 MB=67952 LOK=0 hcglfb=63460
12:32:12.773 -> AARD:1: <---- AK 39 bytes Q=792 H=137812 MB=67952 LOK=0 hcglfb=63460
12:32:12.820 -> AARD:1: <---- AK 39 bytes Q=796 H=137456 MB=67952 LOK=0 hcglfb=63460
12:32:12.913 -> AARD:1: <---- AK 39 bytes Q=800 H=137116 MB=67952 LOK=0 hcglfb=63460
12:32:12.960 -> AARD:1: <---- AK 39 bytes Q=803 H=136892 MB=67952 LOK=0 hcglfb=63460
12:32:13.007 -> AARD:1: <---- AK 39 bytes Q=807 H=136028 MB=67952 LOK=0 hcglfb=62676
12:32:13.053 -> AARD:1: <---- AK 39 bytes Q=813 H=135524 MB=67952 LOK=0 hcglfb=62272
12:32:13.147 -> AARD:1: <---- AK 39 bytes Q=817 H=135228 MB=67952 LOK=0 hcglfb=61892
12:32:13.194 -> AARD:1: <---- AK 39 bytes Q=821 H=134904 MB=67952 LOK=0 hcglfb=61528
12:32:13.241 -> AARD:1: <---- AK 39 bytes Q=824 H=130976 MB=67952 LOK=0 hcglfb=54548
12:32:13.335 -> AARD:1: <---- AK 39 bytes Q=828 H=130100 MB=67952 LOK=0 hcglfb=54548
12:32:13.382 -> AARD:1: <---- AK 39 bytes Q=831 H=129892 MB=67952 LOK=0 hcglfb=54548
12:32:13.429 -> AARD:1: <---- AK 39 bytes Q=835 H=130072 MB=67952 LOK=0 hcglfb=54548
12:32:13.523 -> AARD:1: <---- AK 39 bytes Q=838 H=129512 MB=67952 LOK=0 hcglfb=54548
12:32:13.569 -> AARD:1: <---- AK 39 bytes Q=842 H=129468 MB=67952 LOK=0 hcglfb=54548
12:32:13.615 -> AARD:1: <---- AK 39 bytes Q=846 H=129172 MB=67952 LOK=0 hcglfb=54548
12:32:13.709 -> AARD:1: <---- AK 39 bytes Q=850 H=128316 MB=67952 LOK=0 hcglfb=54548
12:32:13.756 -> AARD:1: <---- AK 39 bytes Q=853 H=127732 MB=67952 LOK=0 hcglfb=54548
12:32:13.803 -> AARD:1: <---- AK 39 bytes Q=857 H=127712 MB=67952 LOK=0 hcglfb=54548
12:32:13.898 -> AARD:1: <---- AK 39 bytes Q=861 H=127412 MB=67952 LOK=0 hcglfb=53952
12:32:13.945 -> AARD:1: <---- AK 39 bytes Q=864 H=127164 MB=67952 LOK=0 hcglfb=53928
12:32:14.039 -> AARD:1: <---- AK 39 bytes Q=868 H=126332 MB=67952 LOK=0 hcglfb=53116
12:32:14.087 -> AARD:1: <---- AK 39 bytes Q=871 H=125812 MB=67952 LOK=0 hcglfb=53116
12:32:14.134 -> AARD:1: <---- AK 39 bytes Q=878 H=125472 MB=67952 LOK=0 hcglfb=53116

and a few seconds later when the limit of 78k FH is passed, there is > 10-fold discrepancy:

 12:32:21.761 -> AARD:1: <---- AK 40 bytes Q=1283 H=83224 MB=67952 LOK=0 hcglfb=8256
12:32:21.809 -> AARD:1: <---- AK 40 bytes Q=1286 H=82988 MB=67952 LOK=0 hcglfb=8256
12:32:21.904 -> AARD:1: <---- AK 40 bytes Q=1289 H=82740 MB=67952 LOK=0 hcglfb=8256
12:32:21.951 -> AARD:1: <---- AK 40 bytes Q=1293 H=81892 MB=67952 LOK=0 hcglfb=6680
12:32:22.044 -> AARD:1: <---- AK 40 bytes Q=1296 H=81652 MB=67952 LOK=0 hcglfb=6624
12:32:22.091 -> AARD:1: <---- AK 39 bytes Q=1299 H=81068 MB=67952 LOK=0 hcglfb=6116
12:32:22.184 -> AARD:1: <---- AK 33 bytes Q=1303 H=81076 MB=67952 LOK=0 hcglfb=6392
12:32:22.184 -> SLOG: wifi BACKOFF 
12:32:22.231 -> SLOG: wifi WARN UI BACKOFF FH=77236 MB=67952
12:32:22.277 -> AARD:1: <---- AK 40 bytes Q=1303 H=81064 MB=67952 LOK=1 hcglfb=6392
12:32:22.324 -> AARD:1: <---- AK 40 bytes Q=1302 H=81152 MB=67952 LOK=1 hcglfb=6392
12:32:22.371 -> AARD:1: <---- AK 40 bytes Q=1301 H=81236 MB=67952 LOK=1 hcglfb=6392
12:32:22.465 -> AARD:1: <---- AK 40 bytes Q=1300 H=81316 MB=67952 LOK=1 hcglfb=6392
12:32:22.512 -> AARD:1: <---- AK 40 bytes Q=1299 H=81412 MB=67952 LOK=1 hcglfb=6392
12:32:22.558 -> AARD:1: <---- AK 40 bytes Q=1298 H=81496 MB=67952 LOK=1 hcglfb=6392
12:32:22.652 -> AARD:1: <---- AK 40 bytes Q=1297 H=81576 MB=67952 LOK=1 hcglfb=6392

SO I guess the new questions are:

  1. why do the values start to diverge @ 69500?
  2. why does ESP.maxHeapBlock "stick" at 67952? If they are meant to be measuring the same physical resource, then it looks like ESP.maxHeapBlock is broken.

Next I tried to force the crash again: by dropping the limit to 50000. And now I see why the crash is occuring: I won't even bother to do the dump because we both know where it will be!

13:00:41.529 -> AARD:1: <---- AK 41 bytes Q=1328 H=72984 MB=67952 LOK=0 hcglfb=1536
13:00:41.576 -> AARD:1: <---- AK 41 bytes Q=1332 H=72992 MB=67952 LOK=0 hcglfb=1536
13:00:41.623 -> AARD:1: <---- AK 41 bytes Q=1335 H=72724 MB=67952 LOK=0 hcglfb=1536
13:00:41.857 -> abort() was called at PC 0x40169a33 on core 1
13:00:41.857 -> 
13:00:41.857 -> ELF file SHA256: 0000000000000000
13:00:41.857 -> 
13:00:41.857 -> Backtrace: 0x400887c4:0x3ffb1950 0x40088a41:0x3ffb1970 0x40169a33:0x3ffb1990 0x40169a7a:0x3ffb19b0 0x401693a5:0x3ffb19d0 0x4016947c:0x3ffb19f0 0x4016a06d:0x3ffb1a10 0x4016a18a:0x3ffb1a30 0x4016a4a5:0x3ffb1a70 0x4016a51e:0x3ffb1a90 0x400e3ce7:0x3ffb1ab0 0x400e3f3a:0x3ffb1b10 0x400e3f5c:0x3ffb1b40 0x400e3ae2:0x3ffb1b60 0x400e800f:0x3ffb1ba0 0x400e8375:0x3ffb1bc0 0x400dbc6b:0x3ffb1c30 0x400dd83b:0x3ffb1c50 0x400ddf2d:0x3ffb1cb0 0x400defbe:0x3ffb1d20 0x400d162a:0x3ffb1d60 0x400d4a51:0x3ffb1d80 0x400d4b1f:0x3ffb1db0 0x400d8df5:0x3ffb1df0 0x400d91f9:0x3ffb1e50 0x400d922b:0x3ffb1e70 0x400d1729:0x3ffb1eb0 0x400e017e:0x3ffb1ef0 0x400e0ea7:0x3ffb1f10 0x400e0f4a:0x3ffb1f30 0x400e0f66:0x3ffb1f50 0x400e0fbe:0x3ffb1f90 0x400f0ee1:0x3ffb1fb0 0x40089a52:0x3ffb1fd0
13:00:41.950 -> 

Ok so this explains the FH ~78k issue: its actually the fragmentation reaching crazy levels and other tasks needing that memory. This more than ever convinces me that ESP.maxHeapBlock is broken. I will run it again and see if ESP.getFreeHeap differs from heap_caps_get_free_size

@philbowles
Copy link
Author

heap_caps_get_free_size(MALLOC_CAP_DEFAULT is showing a huge difference - is that the right call? If not what is the heap_caps_* equivalent of ESP.getFreeHeap?

13:31:39.719 -> AARD:1: <- AK 30b Q=0 FH=222892 hcgfs=155536 MB=113792 hcglfb=113792
13:31:40.660 -> AARD:1: <- AK 30b Q=1 FH=222836 hcgfs=155480 MB=113792 hcglfb=113792
13:31:40.706 -> AARD:1: <- AK 42b Q=0 FH=222928 hcgfs=155572 MB=113792 hcglfb=113792
13:31:41.684 -> AARD:1: <- AK 38b Q=1 FH=222852 hcgfs=155496 MB=113792 hcglfb=113792
13:31:41.732 -> AARD:1: <- AK 30b Q=0 FH=222924 hcgfs=155568 MB=113792 hcglfb=113792

@philbowles
Copy link
Author

Trusting ESP.getFreeHeap seems to be the problem: again there is this 10-fold discrepancy at the point of crash:

13:35:19.217 -> AARD:1: <- AK 39b Q=1352 FH=75216 hcgfs=7860 MB=67356 hcglfb=1536
13:35:19.309 -> AARD:1: <- AK 39b Q=1355 FH=74964 hcgfs=7608 MB=67356 hcglfb=1536
13:35:19.356 -> abort() was called at PC 0x40169ab3 on core 1
13:35:19.356 -> 
13:35:19.356 -> ELF file SHA256: 0000000000000000
13:35:19.356 -> 
13:35:19.356 -> Backtrace: 0x400887ec:0x3ffb1950 0x40088a69:0x3ffb1970 0x40169ab3:0x3ffb1990 0x40169afa:0x3ffb19b0 0x40169425:0x3ffb19d0 0x401694fc:0x3ffb19f0 0x4016a0ed:0x3ffb1a10 0x4016a20a:0x3ffb1a30 0x4016a525:0x3ffb1a70 0x4016a59e:0x3ffb1a90 0x400e3cef:0x3ffb1ab0 0x400e3f42:0x3ffb1b10 0x400e3f64:0x3ffb1b40 0x400e3aea:0x3ffb1b60 0x400e8017:0x3ffb1ba0 0x400e837d:0x3ffb1bc0 0x400dbc6b:0x3ffb1c30 0x400dd83b:0x3ffb1c50 0x400ddf2d:0x3ffb1cb0 0x400defbe:0x3ffb1d20 0x400d162a:0x3ffb1d60 0x400d4a51:0x3ffb1d80 0x400d4b1f:0x3ffb1db0 0x400d8df5:0x3ffb1df0 0x400d91f9:0x3ffb1e50 0x400d922b:0x3ffb1e70 0x400d1729:0x3ffb1eb0 0x400e017e:0x3ffb1ef0 0x400e0ea7:0x3ffb1f10 0x400e0f4a:0x3ffb1f30 0x400e0f66:0x3ffb1f50 0x400e0fbe:0x3ffb1f90 0x400f0ee9:0x3ffb1fb0 0x40089a7a:0x3ffb1fd0
13:35:19.450 ->

@Pablo2048
Copy link

ESP.getFreeHeap equivalent is

return heap_caps_get_largest_free_block(MALLOC_CAP_INTERNAL);
, so there is MALLOC_CAP_DEFAULT vs MALLOC_CAP_INTERNAL . Unfortunately this became deeply to the SDK...

@philbowles
Copy link
Author

philbowles commented Jul 2, 2021

ESP.getFreeHeap equivalent is

return heap_caps_get_largest_free_block(MALLOC_CAP_INTERNAL);

, so there is MALLOC_CAP_DEFAULT vs MALLOC_CAP_INTERNAL . Unfortunately this became deeply to the SDK...

So whats the difference between MALLOC_CAP_DEFAULT vs MALLOC_CAP_INTERNAL ? The real question is: what is the correct call to ACCURATELY predict when the heap is dangerously low? Obvioulsy if I'm making a big allocation I would check heap_caps_get_largest_free_block(MALLOC_CAP_DEFAULT) first but in a multi-tasking environment, its the other background tasks I have to think of too in the general picture and not starve them - I need a figure for the overall free heap. Obviously I'm going to add e.g. 10% safety margin to what that figure is at run-time, but I need to know a reliable and accurate way to get it!

Already obvious that ESP.* calls aren't the answer!

@igrr
Copy link
Member

igrr commented Jul 2, 2021

MALLOC_CAP_INTERNAL describes any internal memory — for example, 32-bit aligned IRAM region is also included. Since you never allocate from that region, it is mostly "free" and this explains why the value gets "stuck".

MALLOC_CAP_DEFAULT is the same memory type as will be allocated by malloc. When PSRAM is not enabled, it will be MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT — i.e. internal memory accessible at 1-byte granularity with no alignment restrictions.

If PSRAM is enabled, and malloc is configured to also use PSRAM, then MALLOC_CAP_DEFAULT will also include external memory into the calculation.

So MALLOC_CAP_DEFAULT can return either smaller or larger value than MALLOC_CAP_INTERNAL.

If you want to see the value that is related to the heap that malloc and free functions use, then you should use MALLOC_CAP_DEFAULT.

@philbowles
Copy link
Author

@igrr thanks for that clear explanatin so are you saying I should replace my calls to ESP.getFreeHeap() with heap_caps_free_size(MALLOC_CAP_DEFAULT) ? Does this also explaing the max heap block discrepancy between ESP. and heap_caps ? Is this just a documentation isuue?

My ultimate goal is to get a single figure figure below which I should back off any more of my own heap allocation to let the other tasks "breathe", and prevent a crash.

I can live with "if MBLIMIT < maxblocks OR FSLIMIT < freespace then backoff"... I just need to know the "best" way if the ESP. functions can't give me a good answer, whihc at the moment it seems they can't

@igrr
Copy link
Member

igrr commented Jul 2, 2021

@me-no-dev do you recall why ESP.getFreeHeap and ESP.maxHeapBlock used MALLOC_CAP_INTERNAL instead of MALLOC_CAP_DEFAULT? It seems that showing the amount of heap accessible by malloc (which is MALLOC_CAP_DEFAULT) would cause less confusion.

@Pablo2048
Copy link

Well, it seems like MALLOC_CAP_DEFAULT show all the available memory, including PSRAM on Wrover modules - ESP.getFreeHeap shows 172696 bytes, heap_caps_free_size(MALLOC_CAP_DEFAULT) shows 4356824 bytes.

@me-no-dev
Copy link
Member

@igrr what I recall is that prior to 2.0.0, PSRAM was not mapped to malloc and that we wanted FreeHeap to show the internal memory, while FreePSRAM to show the external. It probably makes sense to be changed now that all is available through malloc

@igrr
Copy link
Member

igrr commented Jul 5, 2021

Okay, if you decide to keep MALLOC_CAP_INTERNAL, please add | MALLOC_CAP_8BIT since otherwise the 32-bit-only IRAM region that isn't available through malloc is included (on ESP32).

rtrbt added a commit to Tinkerforge/esp32-brick that referenced this issue Jul 19, 2021
ESP.getFreeHeap() includes memory that can only be accessed with
32 bit alignment. However malloc etc. will never allocate from this
memory.

espressif/arduino-esp32#5346 (comment)
@stale
Copy link

stale bot commented Sep 6, 2021

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issue is stale stage (outdated/stuck) label Sep 6, 2021
@igrr igrr added Type: Bug 🐛 All bugs and removed Status: Stale Issue is stale stage (outdated/stuck) labels Sep 7, 2021
@SuGlider SuGlider self-assigned this Feb 10, 2022
@SuGlider
Copy link
Collaborator

SuGlider commented Mar 3, 2022

Just to close this issue:

  • From Arduino Core Version 2.0.0 forward, malloc() also maps PSRAM as part of "allocatable" memory.
  • In PR Allows spiram malloc with wifi dynamic buffers - better free heap #5791, merged in Arduino Core 2.0.1, regular SRAM memory for data is better managed and there is more free available heap for applications.
  • ESP32 chip has IRAM and DRAM. Only DRAM is available for malloc(). Each one is reported separately or together depending on the parameters used with ESP.getFreeHeap IDF call, as explained above.

In case of further questions, please reopen this issue.

@SuGlider SuGlider closed this as completed Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants