Skip to content

CORRUPT HEAP assert failed: multi_heap_free when connecting to BLE Server #6961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
savejeff opened this issue Jul 9, 2022 · 8 comments
Closed
1 task done
Labels
Status: Community help needed Issue need help from any member from the Community. Type: Question Only question

Comments

@savejeff
Copy link

savejeff commented Jul 9, 2022

Board

ESP32-S3

Device Description

DevKitC-1, soldered and Breadboard using multiple Sensors over I2C and SPI, running on both cors with Multiple Tasks

Core0: running Task0 (higher prio) and Task2 (lower prio)
Core1: running Task1 (higher prio) and Task3 (lower prio)

Hardware Configuration

SARA R422M8N over Serial Port
SD Card over SPI
IMU over I2C
etc ...

Version

v2.0.3

IDE Name

PlatformIO

Operating System

Windows 10

Flash frequency

240Mhu

PSRAM enabled

yes

Upload speed

115200

Description

Since using the ESP32 S3 with Both cores with the Core version 2.0.0 i get heap corrupted errors on BLE connect

The Code runs fine until i connect through my smartphone to the BLE Server running on the ESP32-S3

this is what I'm getting
image

image

image

Debug Message

CORRUPT HEAP: Bad tail at 0x3fcb728c. Expected 0xbaad5678 got 0xbaad5600

assert failed: multi_heap_free multi_heap_poisoning.c:253 (head != NULL)


Decoding stack results
0x403776b2: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c line 402
0x4037eea1: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c line 128
0x40384cbd: __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c line 85
0x40384917: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c line 245
0x40377b49: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c line 340
0x40384ced: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c line 39
0x4205193d: String::invalidate() at C:/Users/Save_/.platformio/packages/framework-arduinoespressif32/cores/esp32/WString.h line 329
0x4205194d: String::~String() at C:/Users/Save_/.platformio/packages/framework-arduinoespressif32/cores/esp32/WString.cpp line 141

It seems to happen always when string is used. Might this be a not thread save malloc/free when a string object is created/destroyed. I had a similar issue on the RP2040 where malloc and free and some variants in newlib where not wrapped and thus crashes when both cores tried to print with a float

My Question

How can i interpret this error ? What could cause Heap Corruption?
what does multi_heap_free and multi_heap_poisoning mean?
What would be the best way to debug this?

I have checked:
There is enough Free ram. i use about 50% of the main RAM and around 20% of the PSRAM

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.
@savejeff savejeff added the Status: Awaiting triage Issue is waiting for triage label Jul 9, 2022
@SuGlider SuGlider added Type: Question Only question Status: Community help needed Issue need help from any member from the Community. and removed Status: Awaiting triage Issue is waiting for triage labels Jul 10, 2022
@SuGlider
Copy link
Collaborator

SuGlider commented Jul 10, 2022

@savejeff -
Could you please provide the smallest possible Arduino sketch that reproduces this issue?
That's the best way to help the community to help you.

This error maybe due to low Stack size of your Tasks.
It is hard to tell at this moment because there is not enough information.

@savejeff
Copy link
Author

savejeff commented Jul 11, 2022

I know that it is hard to analyse this Problem.
Currently, the Crash only occurs very infrequently and only when I'm in the field where the Mobile reception is bad (I'm using Cellular Connection over Serial).
The Project is pretty big and it only crashes when most of the code is included.
Thus I don't see a way to extract a minimum sketch that reproduces the problem.

I have checked the total heap usage (around 50%)
I have given all tasks a sufficient amount of memory I think. Is there a way to check the stack usage on each task?

might this be a stack size problem in the Bluetooth task? Is there a way to check this?

Is there a way to dump the state of all takes at the time of crash?

FYI this is the current setting for the stack sized of the tasks. I'll try different settings

#define TASK_STACK_SIZE_TASK0 (1 << 13)
#define TASK_STACK_SIZE_TASK1 (1 << 13)
#define TASK_STACK_SIZE_TASK2 (1 << 13)
#define TASK_STACK_SIZE_TASK3 (1 << 13)

@SuGlider
Copy link
Collaborator

might this be a stack size problem in the Bluetooth task? Is there a way to check this?

Is there a way to dump the state of all takes at the time of crash?

FYI this is the current setting for the stack sized of the tasks. I'll try different settings

A few links that may help you in getting information about stack size versus stack consumption:

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos.html#_CPPv49vTaskListPc
It will list all tasks and stack water marks in order to monitor Stack comsumption.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos.html#_CPPv427uxTaskGetStackHighWaterMark12TaskHandle_t
uxTaskGetStackHighWaterMark shows how much stack was never used in the stack, whenever it doesn't overflow.
As close as possible to 0 (with some margin), would be a optimal stack size.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos.html#_CPPv412vTaskGetInfo12TaskHandle_tP12TaskStatus_t10BaseType_t10eTaskState
This function will return information about free stack space in the task.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/mem_alloc.html
There are different ways to allocate the HEAP and different regions of memory in ESP32.
Check if you are using the right way to verify DRAM availble space in Application HEAP.

@savejeff
Copy link
Author

savejeff commented Jul 12, 2022

Hi SuGlider,

Thx for the links to analyze the stack.

about the first link: i already tried vTaskList but i get a compilation error even when setting configUSE_TRACE_FACILITY=1 and configUSE_STATS_FORMATTING_FUNCTIONS =1. i get undefined reference to vTaskList from the linker

i have set both flags via buildflags. Might it be, that vTaskList is not included in the precompiled part?

@Ouss4
Copy link
Contributor

Ouss4 commented Jul 13, 2022

To answer some of your questions:

When using BLE, there is a memory region that's reserved for its operation, this region should not be part of the heap, otherwise the heap will be corrupted when the BLE uses its memory.
For now I can't see anything that could cause a corruption but looking at the errors you have, the canary values are supposed to be 0xbaad5678 and 0xABBA1234 but are 0xbaad5600 and 0x00001234.

Also in one of your screens there seems to be an allocation at 0x3fffd2bc this looks incorrect as this address space is reserved.

@savejeff
Copy link
Author

savejeff commented Jul 14, 2022

Thx @Ouss4

Thanks for the clarification and insight.
My current guess is like earlier suggested. One of the Tasks runs out of Stack. Ill try to monitor the stack using the functions provided by SuGlider.
i also found the code runs more stable if i do more delays/yielding on the cellular task. That task might have taken up to much computational time or something like that.

What i can rule out with high certainty is that is something like writing out of bound on arrays in my code. i also use malloc very carefully.

I do a lot of print with float and String. so an unfortunate combination of multiple tasks try to print strings and floats into a format string the stack usage might increase to much.

@VojtechBartoska
Copy link
Contributor

@savejeff do you still need help?

@savejeff
Copy link
Author

ah sry.
No im not getting heap corruption errors, i changed some code to increase yield/sleep calls, which seemed to fix the problem.

sadly vTaskList does not work even with the right flags enabled. seems like its not included in the precompiled code. would be helpful to have more insight into the tasks status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Community help needed Issue need help from any member from the Community. Type: Question Only question
Projects
None yet
Development

No branches or pull requests

4 participants