Skip to content

Using xSemaphoreTake() in task running httpUpdate.update() breaks httpUpdate.update() #7967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
papakpmartin opened this issue Mar 16, 2023 · 7 comments
Closed
1 task done
Labels
Status: Awaiting triage Issue is waiting for triage

Comments

@papakpmartin
Copy link

papakpmartin commented Mar 16, 2023

Board

ESP32-S2

Device Description

Example code can be found here.

If xSemaphoreTake() is used, the actual file download of httpUpdate.update() slows down, acts erratically, and very often doesn't complete, allowing enough time for the watchdog to kick in and reboot the device. See debug below for what it looks like when it fails.

In the example repo noted above, I have these three vars in place:

#define MUTEX_PERMIT_TO_EXIST 1
#define MUTEX_USE_OUTSIDE_OF_COMMS 1
// #define MUTEX_USE_INSIDE_OF_COMMS 1 // enabling this will make OTA updates fail most of the time

Comment out #define MUTEX_USE_INSIDE_OF_COMMS 1 and the update works perfectly and reliably. Uncommented and the failures occur.

“Interesting” part of the code is below. No other mutex of any kind is used anywhere else in the example project.

Hardware Configuration

Two I2C busses are setup but not used. (I wanted to leave that in the example code since my real code uses them, in case I2C is a factor.)

Version

v2.0.7

IDE Name

VS Code + Platformio

Operating System

macOS 12.5.1

Flash frequency

40MHz

PSRAM enabled

no

Upload speed

460800

Description

I would expect to be able to xSemaphoreTake() without any adverse effect on httpUpdate.begin().

Sketch

// full minimal sketch is in repo noted above. This appears to be the key portion of code.
#ifdef MUTEX_USE
        xSemaphoreTake(mutex_i2c, portMAX_DELAY);
#endif

        Comms::instance().ota_perform_update();

#ifdef MUTEX_USE
        xSemaphoreGive(mutex_i2c);
#endif

Debug Message

...
[ 65921][D][HTTPUpdate.cpp:252] handleUpdate():  - current version: 1

[ 66023][D][HTTPUpdate.cpp:306] handleUpdate(): runUpdate flash...

[ 66023][D][Updater.cpp:133] begin(): OTA Partition: app1
OTA: HTTP update process at 0 of 615840 bytes...
OTA: HTTP update process at 0 of 615840 bytes...
OTA: HTTP update process at 4096 of 615840 bytes...
OTA: HTTP update process at 8192 of 615840 bytes...
OTA: HTTP update process at 12288 of 615840 bytes...
OTA: HTTP update process at 16384 of 615840 bytes...
OTA: HTTP update process at 20480 of 615840 bytes...
OTA: HTTP update process at 24576 of 615840 bytes...
OTA: HTTP update process at 28672 of 615840 bytes...
OTA: HTTP update process at 32768 of 615840 bytes...
OTA: HTTP update process at 36864 of 615840 bytes...
OTA: HTTP update process at 40960 of 615840 bytes...
OTA: HTTP update process at 45056 of 615840 bytes...
OTA: HTTP update process at 49152 of 615840 bytes...
OTA: HTTP update process at 53248 of 615840 bytes...
OTA: HTTP update process at 57344 of 615840 bytes...
OTA: HTTP update process at 61440 of 615840 bytes...
OTA: HTTP update process at 65536 of 615840 bytes...

abort() was called at PC 0x400908e3 on core 0
=> 0x400908e3: task_wdt_isr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/task_wdt.c:176
      (inlined by) task_wdt_isr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/task_wdt.c:136


Backtrace: 0x40026792:0x3ffc27b0 0x4002c12d:0x3ffc27d0 0x4003079d:0x3ffc27f0 0x400908e3:0x3ffc2870 0x40027c55:0x3ffc2890 0x4009a931:0x3ffe82a0 0x400838e9:0x3ffe82c0 0x400839fb:0x3ffe82f0 0x40083af6:0x3ffe8320 0x40083ba0:0x3ffe8340 0x400eb882:0x3ffe8380 0x400888e9:0x3ffe83b0 0x40088911:0x3ffe83d0 0x400eb7a1:0x3ffe83f0 0x400e2bd2:0x3ffe8410 0x40087182:0x3ffe8430 0x40087938:0x3ffe84a0 0x40087aeb:0x3ffe8540 0x400821ca:0x3ffe8640 0x400822c4:0x3ffe86a0
=> 0x40026792: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
=> 0x4002c12d: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
=> 0x4003079d: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/abort.c:46
=> 0x400908e3: task_wdt_isr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/task_wdt.c:176
      (inlined by) task_wdt_isr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/task_wdt.c:136
=> 0x40027c55: _xt_lowint1 at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1114
=> 0x4009a931: lwip_ioctl at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/sockets.c:3936
=> 0x400838e9: WiFiClientRxBuffer::r_available() at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/WiFi/src/WiFiClient.cpp:56
=> 0x400839fb: WiFiClientRxBuffer::fillBuffer() at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/WiFi/src/WiFiClient.cpp:81 (discriminator 4)
=> 0x40083af6: WiFiClientRxBuffer::read(unsigned char*, unsigned int) at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/WiFi/src/WiFiClient.cpp:117 (discriminator 3)
=> 0x40083ba0: WiFiClient::read(unsigned char*, unsigned int) at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/WiFi/src/WiFiClient.cpp:493
=> 0x400eb882: WiFiClient::read() at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/WiFi/src/WiFiClient.cpp:400
=> 0x400888e9: Stream::timedRead() at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/cores/esp32/Stream.cpp:36
=> 0x40088911: Stream::readBytes(char*, unsigned int) at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/cores/esp32/Stream.cpp:285
=> 0x400eb7a1: Stream::readBytes(unsigned char*, unsigned int) at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/cores/esp32/Stream.h:103
=> 0x400e2bd2: UpdateClass::writeStream(Stream&) at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/Update/src/Updater.cpp:371
=> 0x40087182: HTTPUpdate::runUpdate(Stream&, unsigned int, String, int) at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/HTTPUpdate/src/HTTPUpdate.cpp:426
=> 0x40087938: HTTPUpdate::handleUpdate(HTTPClient&, String const&, bool) at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/HTTPUpdate/src/HTTPUpdate.cpp:341
=> 0x40087aeb: HTTPUpdate::update(WiFiClient&, String const&, String const&) at /Users/ken.martin/.platformio/packages/framework-arduinoespressif32/libraries/HTTPUpdate/src/HTTPUpdate.cpp:58
=> 0x400821ca: Comms::ota_perform_update() at src/Comms.cpp:139
=> 0x400822c4: task__handleComms(void*) at src/main.cpp:47




ELF file SHA256: 0000000000000000

Rebooting...
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0x3 (RTC_SW_SYS_RST),boot:0xa (SPI_FAST_FLASH_BOOT)
Saved PC:0x400261e5
=> 0x400261e5: esp_restart_noos_dig at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:53 (discriminator 1)
...

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.
@papakpmartin papakpmartin added the Status: Awaiting triage Issue is waiting for triage label Mar 16, 2023
@papakpmartin papakpmartin changed the title Using xSemaphoreTake breaks httpUpdate.update() Using xSemaphoreTake() breaks httpUpdate.update() Mar 16, 2023
@papakpmartin
Copy link
Author

I just made and tested a new branch with an even simpler version of the problem. In this branch:

  • I'm not using MQTT at all
  • I'm not initializing any I2C

The problem still exists exactly as described above... permitting xSemaphoreTake() to be used means the OTA update will nearly always fail. Commenting out #define MUTEX_USE 1 means the OTA update will work normally.

@papakpmartin papakpmartin changed the title Using xSemaphoreTake() breaks httpUpdate.update() Using xSemaphoreTake() in task running httpUpdate.update() breaks httpUpdate.update() Mar 16, 2023
@papakpmartin
Copy link
Author

I've further simplified the example and just merged it into master on my example repo: https://github.com/papakpmartin/ota-troubleshooting-freertos

What I think I'm seeing...

In a project that will use httpUpdate().begin():

  • It is OK to xSemaphoreCreateMutex()
  • It is OK to xSemaphoreTake() and xSemaphoreGive() in a task that does not contain httpUpdate().begin()
  • It is NOT OK to xSemaphoreTake() and xSemaphoreGive() in a task that does contain httpUpdate().begin()

@papakpmartin
Copy link
Author

FWIW, I did see that @Jason2866 has sometimes recommended a special platform build that Tasmota uses that resolves some networking issues, but using that seemed to make no difference to what I'm observing here.

@mrengineer7777
Copy link
Collaborator

@papakpmartin

Semaphores are pretty basic - they are native to FreeRTOS. I'm certain that the httpUpdate library has no conflicts with Semaphores.

I've looked at your demo project a couple times. Observations:

While clever, you are doing things pretty differently than the httpUpdate example.
task__handleComms starts with a 30 second delay.
You are using the same semaphore mutex_i2c in both of your tasks. Since the threads are completely different, this may be a mistake. xSemaphoreTake(mutex_i2c, portMAX_DELAY);
About 6 months ago I had trouble with the HTTPOTA library, so I wrote my own OTA class. I can share my code if you are interested.

@papakpmartin
Copy link
Author

Thanks for taking a look at this, @mrengineer7777! I am pretty new to using semaphores (and FreeRTOS), but the intention of starting with the delay was to try to ensure it didn't fire right away. Otherwise, most of the time I have the delay at the end.

For mutex_i2c, I'm meaning to use that to protect the I2C busses from being used because I found that in my case if an OTA update reboot happens at just the “right” moment, the I2C bus can be left in a funny state. So the intention is that the update process grabs and owns mutex_i2c, thereby preventing that.

So maybe I should ask:

  • Is there a better practice for making sure a task doesn't run right away?
  • Am I doing something wrong by trying to use the semaphore that way?
  • Is one of those two things the reason I am unable to use xSemaphoreTake() and xSemaphoreGive() in a task that contains httpUpdate().begin()?

@mrengineer7777
Copy link
Collaborator

mrengineer7777 commented Mar 21, 2023

* Is there a better practice for making sure a task doesn't run right away?

In that case I would use delay() in the beginning of the task as you have done.

* Am I doing something wrong by trying to use the semaphore that way?

Using the semaphore from multiple tasks is almost guaranteed to cause issues. I would set a flag or call a function to stop I2C communication during the OTA update. Also pause any other services that might interfere with the OTA process.
Bonus points if you can write a reboot function that cleanly shuts down services before the reboot. Example: turning off WiFi using WiFi.mode(WIFI_MODE_NULL) allows the module to reconnect more quickly after a reboot.

* Is one of those two things the reason I am unable to use `xSemaphoreTake()` and `xSemaphoreGive()` in a task that contains `httpUpdate().begin()`?

Yes

if an OTA update reboot happens at just the “right” moment, the I2C bus can be left in a funny state.

There are low level ways to clear an I2C bus on startup. The basic concept is to set the I2C clock pin as open drain or strong output. Toggle it X times to get any slaves to release their clock-stretching hold. Optionally monitor I2C data pin (input) to see when it stays high. If I recall correctly it takes 8-16 clock cycles to clear an I2C bus. Keep in mind minimum pulse width on the clock toggling.

@papakpmartin
Copy link
Author

Wow... this is extremely helpful! I did end up "getting around this" by using a flag... it sounds like "getting around this" was probably just "doing it right". :)

Thanks so much for this and your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting triage Issue is waiting for triage
Projects
None yet
Development

No branches or pull requests

2 participants