Skip to content

Commit e065261

Browse files
authored
Fix deadlock in ExecutorLibdispatch destructor (#1835)
There is a deadlock that is triggered while running integration tests in XCode 10 (previous versions of XCode seem to work fine). During integration test case shutdown, two executors used in Firestore, the worker executor and the user executor, happen to be destroyed at the same time on each other's dispatch queues (worker executor is being destroyed on the main queue, which is also the user queue, and user executor is being destroyed on the worker queue). Each calls `RunSynchronized` in its destructor, with the result that they deadlock. The fix uses `std::atomic` as the synchronization mechanism instead. Tested -- ran integration tests under TSan in XCode 10 with no issues.
1 parent a40a43c commit e065261

File tree

2 files changed

+46
-9
lines changed

2 files changed

+46
-9
lines changed

Firestore/core/src/firebase/firestore/util/executor_libdispatch.mm

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616

1717
#include "Firestore/core/src/firebase/firestore/util/executor_libdispatch.h"
1818

19+
#include <atomic>
20+
1921
#include "Firestore/core/src/firebase/firestore/util/hard_assert.h"
2022

2123
namespace firebase {
@@ -140,10 +142,9 @@ void MarkDone() {
140142

141143
// True if the operation has either been run or canceled.
142144
//
143-
// Note on thread-safety: because the precondition is that all member
144-
// functions of this class are executed on the dispatch queue, no
145-
// synchronization is required for `done_`.
146-
bool done_ = false;
145+
// Note on thread-safety: this variable is accessed both from the dispatch
146+
// queue and in the destructor, which may run on any queue.
147+
std::atomic<bool> done_;
147148
};
148149

149150
TimeSlot::TimeSlot(ExecutorLibdispatch* const executor,
@@ -154,6 +155,9 @@ void MarkDone() {
154155
std::chrono::steady_clock::now()) +
155156
delay},
156157
tagged_{std::move(operation)} {
158+
// Only assignment of std::atomic is atomic; initialization in its constructor
159+
// isn't
160+
done_ = false;
157161
}
158162

159163
Executor::TaggedOperation TimeSlot::Unschedule() {
@@ -199,11 +203,11 @@ void MarkDone() {
199203
// the queue is serial, by the time libdispatch gets to the newly-enqueued
200204
// work, the pending operations that might have been in progress would have
201205
// already finished.
202-
RunSynchronized(this, [this] {
203-
for (auto slot : schedule_) {
204-
slot->MarkDone();
205-
}
206-
});
206+
// Note: this is thread-safe, because the underlying variable `done_` is
207+
// atomic. `RunSynchronized` may result in a deadlock.
208+
for (auto slot : schedule_) {
209+
slot->MarkDone();
210+
}
207211
}
208212

209213
bool ExecutorLibdispatch::IsCurrentExecutor() const {

Firestore/core/test/firebase/firestore/util/executor_libdispatch_test.mm

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@
3333
dispatch_queue_create("ExecutorLibdispatchTests", DISPATCH_QUEUE_SERIAL));
3434
}
3535

36+
namespace chr = std::chrono;
37+
3638
} // namespace
3739

3840
INSTANTIATE_TEST_CASE_P(ExecutorTestLibdispatch,
@@ -68,6 +70,37 @@
6870
EXPECT_TRUE(WaitForTestToFinish());
6971
}
7072

73+
TEST_F(ExecutorLibdispatchOnlyTests, ScheduledOperationOutlivesExecutor) {
74+
namespace chr = std::chrono;
75+
const auto far_away = chr::milliseconds(100);
76+
executor->Schedule(far_away, {Executor::Tag{1}, [] {}});
77+
executor.reset();
78+
// Try to wait until libdispatch invokes the scheduled operation. This is
79+
// flaky but unlikely to not work in practice. The test is successful if
80+
// there is no crash/data race under TSan.
81+
std::this_thread::sleep_for(chr::milliseconds(500));
82+
}
83+
84+
TEST_F(ExecutorLibdispatchOnlyTests,
85+
ScheduledOperationOutlivesExecutor_DestroyedOnOwnQueue) {
86+
const auto far_away = chr::milliseconds(100);
87+
executor->Schedule(far_away, {Executor::Tag{1}, [] {}});
88+
89+
// Invoke destructor on the executor's own queue to make sure there is no
90+
// deadlock.
91+
std::function<void()> reset = [this] { executor.reset(); };
92+
auto queue =
93+
static_cast<ExecutorLibdispatch*>(executor.get())->dispatch_queue();
94+
dispatch_sync_f(queue, &reset, [](void* const raw_reset) {
95+
const auto unwrap = static_cast<std::function<void()>*>(raw_reset);
96+
(*unwrap)();
97+
});
98+
// Try to wait until libdispatch invokes the scheduled operation. This is
99+
// flaky but unlikely to not work in practice. The test is successful if
100+
// there is no crash/data race under TSan.
101+
std::this_thread::sleep_for(chr::milliseconds(500));
102+
}
103+
71104
} // namespace internal
72105
} // namespace util
73106
} // namespace firestore

0 commit comments

Comments
 (0)