Skip to content

Fix semaphores in IDF & std::string assert #2728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 11, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions libraries/BLE/src/FreeRTOS.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,22 +62,21 @@ uint32_t FreeRTOS::getTimeSinceStart() {
uint32_t FreeRTOS::Semaphore::wait(std::string owner) {
log_v(">> wait: Semaphore waiting: %s for %s", toString().c_str(), owner.c_str());

m_owner = owner;
Copy link
Contributor

@projectgus projectgus May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the owner before we take the semaphore seems like maybe it can lead to incorrect behaviour. Think about this sequence of events:

  • Task A calls s->wait("task_a"). m_owner is set to "task_a", and then waiting for the semaphore succeeds because noone was holding it.
  • Task B calls s->wait("task_b"). m_owner is set to "task_b" but then Task B is blocked in xSemaphoreWait() because Task A is still holding the semaphore.

Now we have a situation where Task A is holding the semaphore, but the owner field indicates Task B is holding it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, hadn't really thought about it as the way the BLE library currently works (on the client side anyway) there is multiple semaphores used and only one task actually takes each of them. Also, as this was a mutex before, the semaphore is usually taken already before we call the wait function to block the task and wait for the callback to give the semaphore back, at which time it sets m_owner to "<N/A>" anyway.

The reason for this change though is I've encountered many instances of on assert error when giving the semaphore from the callback as both cores sometimes try to write to m_owner at the same time, one in the give() on core 0 and this one after we've unblocked in wait() on core 1.

Anyway, maybe a better solution would be to pin the callback task to core 1, or maybe just remove the setting of m_owner in wait() all together? I'm open to suggestions :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this m_owner anyway? Logging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as far as I can tell it’s only used for logging, but I suppose someone could call the toString() method on the semaphore to check ownership.

I’m tempted just to remove it entirely for my use, but others might find it useful.

Copy link
Contributor

@projectgus projectgus May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h2zero

Right, sorry I should have read the full code - I missed that wait() isn't "take", it's "take then give". Not quite what I expected...

The reason for this change though is I've encountered many instances of on assert error when giving the semaphore from the callback as both cores sometimes try to write to m_owner at the same time, one in the give() on core 0 and this one after we've unblocked in wait() on core 1.

If this is the problem then I think the best solution is to change all the m_owner assignments so they only happen when holding the semaphore, so the semaphore protects against any race condition. I think this means:

  • In ::give(), move setting of m_owner up to before the semaphore is given so it's assigned while the caller still holds the semaphore.
  • In ::wait(), don't set m_owner at all. As soon as the caller in wait() gets the semaphore it gives it back, and giving a semaphore never blocks, so if you did set m_owner then it gets immediately un-set afterwards which seems like a waste of CPU cycles.

Copy link
Contributor Author

@h2zero h2zero May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@projectgus

Yeah that’s along the line of my thinking as well. Sorry I changed the PR yesterday assuming you wanted me to revert the m_owner move and I didn’t see your reply here first. I’ll make the changes as you’ve mentioned and fix the PR again lol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@projectgus

Made the changes, tested with my test code that would cause the assert within seconds and has been running over an hour now. Logging works fine as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! :)


if (m_usePthreads) {
pthread_mutex_lock(&m_pthread_mutex);
} else {
xSemaphoreTake(m_semaphore, portMAX_DELAY);
}

m_owner = owner;

if (m_usePthreads) {
pthread_mutex_unlock(&m_pthread_mutex);
} else {
xSemaphoreGive(m_semaphore);
}

log_v("<< wait: Semaphore released: %s", toString().c_str());
m_owner = std::string("<N/A>");
return m_value;
} // wait

Expand All @@ -87,7 +86,8 @@ FreeRTOS::Semaphore::Semaphore(std::string name) {
if (m_usePthreads) {
pthread_mutex_init(&m_pthread_mutex, nullptr);
} else {
m_semaphore = xSemaphoreCreateMutex();
m_semaphore = xSemaphoreCreateBinary();
xSemaphoreGive(m_semaphore);
}

m_name = name;
Expand Down