10 January 2022 by Phillip Johnston • Last updated 12 September 2022
I recently received a question regarding our dispatch queue implementation:
Quote
I had a question concerning the locking of the mutex in the destructor to set the boolean quit_
variable. The way I see it, the mutex protects the queue access and does not really refer to the quit_
variable. When the variable has been set, the condition_variable
is used to broadcast that change.
Question: Why is the mutex locked in this case? Is it to force a CPU cache synchronization?
This question is referring to the following snippet of code:
dispatch_queue::~dispatch_queue(){ printf("Destructor: Destroying dispatch threads..."); // Signal to dispatch threads that it's time to wrap up std::unique_lock<std::mutex> lock(lock_); quit_ = true; cv_.notify_all(); lock.unlock(); // Wait for threads to finish before we exit for(size_t i = 0; i < threads_.size(); i++) { if(threads_[i].joinable()) { printf("Destructor: Joining thread %zu until completion", i); threads_[i].join(); } }}
Which signals the condition_variable
in the worker threads, as shown here:
void dispatch_queue::dispatch_thread_handler(void){ std::unique_lock<std::mutex> lock(lock_); do { // Wait until we have data or a quit signal cv_.wait(lock, [this] { return (q_.size() || quit_); }); // after wait, we own the lock if(!quit_ && q_.size()) { auto op = std::move(q_.front()); q_.pop(); // unlock now that we're done messing with the queue lock.unlock(); op(); lock.lock(); } } while(!quit_);}
I vaguely recalled that I needed to lock around the quit_
variable to avoid a race condition during the destruction process, where my destructor would never complete because some of the worker threads would never wake up, see the quit_
setting, and join()
. Eventually, I found the original source of the change (hidden away in an archived project in a commit made years ago) and confirmed this memory. Unfortunately, at that time I was much less diligent at documenting the rationales and references that lead to my conclusion, so I did some digging on why that lock was necessary.
Note
Links and quotes from my research can be found in the References section below.
Fundamentally, you must use both modify and test the shared variable (that which is referenced by the condition_variable
) using the same mutex in order to assure that threads are blocked and will see the condition_variable
signal. As cppreference clearly lays out:
Quote
The thread that intends to modify the shared variable has to
- acquire a
std::mutex
(typically vialock_guard
) - perform the modification while the lock is held
- execute
notify_one
ornotify_all
on thestd::condition_variable
(the lock does not need to be held for notification)
Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread.
Based on our conversation, related questions on Stack Overflow, and discussions in the C++ Core Guidelines GitHub repository, it seems that a failure to lock around shared variables is a common condition_variable
mistake.
Our conversation took another useful turn when exploring this question:
Should you unlock the mutex before or after you invoke
notify_one
/notify_all
on the condition variable?
In the original dispatch queue code, I manually unlocked the mutex before invoking notify
, which is a pattern I picked up from the cppreference example code:
// Manual unlocking is done before notifying, to avoid waking up // the waiting thread only to block again (see notify_one for details) lk.unlock(); cv.notify_one();
However, it seems that the C++ Core Guidelines discussions on GitHub prefer this general guideline:
Quote
When notifying a condition_variable
, default to doing it before you unlock the mutex instead of after. Rationale: It is never wrong to unlock after the notify, and it is sometimes wrong to unlock before the notify. And there is no performance penalty in unlocking after, as good implementations take this coding pattern into account.
While I agree that unlocking after notifying the condition_variable
is a sensible default guideline, I do find “there is no performance penalty” to be an overly strong statement. It seems that there are implementations of pthread that will notice the spurious wakeup case and avoid it, which motivated that claim. However, if you are implementing a condition_variable
with an RTOS, be aware that you may be invoking a wakeup and block scenario. As with all optimizations, however, you should confirm that this is a) actually occurring and b) causing a problem in your system.
Why Do You Still Need a Lock if the Variable is Atomic?
Even if you have an atomic variable, you still need the mutex. Without it, there is still the potential to miss the notification under a case of a wake-up interleaved with an update to the atomic variable. The problematic scenario is the interleaving of these steps, which can happen with/without atomic:
- Worker thread is awake for some reason, acquires the mutex, checks the predicate, and at the time it checks quit_ is false and queue is empty. It must wait on the CV again, but this is not completed yet.
- Main thread sets quit_ to true
- Main thread issues notification, which is not received by the awake worker thread yet because it is not waiting on the CV
- Waiting thread waits on the CV, but does not ever wake up because the notification is already sent and the main thread is waiting to join.
In other words, the mutex is for synchronization on the condition_variable
, not the synchronization on quit_
.
Now, you can use an atomic without a lock to set the quit_
variable for the threads, but you would need a different wakeup strategy, i.e. quit_
isn’t part of the predicate in the condition_variable
wait()
. You would need some other logic to make you wake up in this situation, and then check the quit_
flag.
Review
References
- Implementing an Asynchronous Dispatch Queue
- std::condition_variable – cppreference.com
Thecondition_variableclass is a synchronization primitive that can be used to block a thread, or multiple threads at the same time, until another thread both modifies a shared variable (thecondition), andnotifies thecondition_variable.
The thread that intends to modify the shared variable has to
- acquire a
std::mutex
(typically vialock_guard
) - perform the modification while the lock is held
- execute notify_one or notify_all on the
std::condition_variable
(the lock does not need to be held for notification)
Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread.
- acquire a
- CP guidelines for condition variables · Issue #554 · isocpp/CppCoreGuidelines
Condition variables are not semaphores. Notifications will be missed if they are sent when no other thread is blocked waiting on the condition variable, so you must not just rely on
notify_one()
ornotify_all()
to signal another thread, you must always have some predicate that is tested, e.g. a boolean flag (protected by the same mutex that is used when waiting on the condition variable). With a predicate that tests some condition, even if the notification is sent before the other thread waits on the condition variable then it won’t be “missed” because the predicate will be true and the wait will return immediately.unlocking the mutex before notifying is an optimisation, and not essential. I intentionally didn’t do that, to keep the example simple. There could be a second guideline about that point, but it’s not related to the “always use a predicate” rule. I would object strongly to complicating the example by doing that.
Failing to use a mutex associated with a condition variable is one of the most common mistakes
- Renaming CP.25 raii_thread, and fixing CP.26 detached_thread · Issue #925 · isocpp/CppCoreGuidelines
When notifying a condition_variable, default to doing it before you unlock the mutex instead of after. Rationale: It is never wrong to unlock after the notify, and it is sometimes wrong to unlock before the notify. And there is no performance penalty in unlocking after, as good implementations take this coding pattern into account. [PJ: There may not be a performance penalty on pthread, but there may be on in a custom implementation for an RTOS – but that is RTOS dependent.]
- std::condition_variable::notify_one – cppreference.com
The effects of
notify_one()
/notify_all()
and each of the three atomic parts ofwait()
/wait_for()
/wait_until()
(unlock+wait, wakeup, and lock) take place in a single total order that can be viewed as modification order of an atomic variable: the order is specific to this individual condition variable. This makes it impossible fornotify_one()
to, for example, be delayed and unblock a thread that started waiting just after the call tonotify_one()
was made.The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock. However, some implementations (in particular many implementations of pthreads) recognize this situation and avoid this “hurry up and wait” scenario by transferring the waiting thread from the condition variable’s queue directly to the queue of the mutex within the notify call, without waking it up.
Notifying while under the lock may nevertheless be necessary when precise scheduling of events is required, e.g. if the waiting thread would exit the program if the condition is satisfied, causing destruction of the notifying thread’s condition variable. A spurious wakeup after mutex unlock but before notify would result in notify called on a destroyed object.
- CP: std::condition_variable(), unlock mutex before notfifying waiting thread(s)? · Issue #1272 · isocpp/CppCoreGuidelines
- c++ – std::conditional_variable::notify_all does not wake up all the threads – Stack Overflow
This is the normal pattern for using condition variables correctly – you need to both test and modify the condition you want to wait on within the same mutex.
- c++ – Sync is unreliable using std::atomic and std::condition_variable – Stack Overflow
This works very well if threads enter the fence over a period of time. However, if they try to do it almost simultaneously, it seems to sometimes happen that between the atomic decrementation (1) and starting the wait on the conditional var (3), the thread yields CPU time and another thread decrements the counter to zero (1) and fires the cond. var (2). This results in the previous thread waiting forever in (3), because it starts waiting on it after it has already been notified.
- c++ – Shared atomic variable is not properly published if it is not modified under mutex – Stack Overflow
- c++ – Does notify happen-before wakeup of wait? – Stack Overflow
Related
Related Terms:
- Term: Race Condition
- Term: C++
- Term: Real-Time Operating System
- Term: Contrast Value