Saturday, January 13, 2018

Destruction of condition variable randomly loses notification

Leave a Comment

Given a condition_variable as a member of a class, my understanding is that:

  1. The condition variable is destroyed after the class destructor completes.
  2. Destruction of a condition variable does not need to wait for notifications to have been received.

In light of these expectations, my question is: why does the example code below randomly fail to notify a waiting thread?

#include <mutex> #include <condition_variable> #define NOTIFY_IN_DESTRUCTOR   struct notify_on_delete {     std::condition_variable cv;      ~notify_on_delete() { #ifdef NOTIFY_IN_DESTRUCTOR         cv.notify_all(); #endif     } };  int main () {     for (int trial = 0; trial < 10000; ++trial) {         notify_on_delete* nod = new notify_on_delete();         std::mutex flag;         bool kill = false;          std::thread run([nod, &flag, &kill] () {             std::unique_lock<std::mutex> lock(flag);             kill = true;             nod->cv.wait(lock);         });          while(true) {             std::unique_lock<std::mutex> lock(flag);             if (!kill) continue; #ifdef NOTIFY_IN_DESTRUCTOR             delete nod; #else             nod->cv.notify_all(); #endif             break;         }         run.join(); #ifndef NOTIFY_IN_DESTRUCTOR         delete nod; #endif     }     return 0; } 

In the code above, if NOTIFY_IN_DESTRUCTOR is not defined then the test will run to completion reliably. However, when NOTIFY_IN_DESTRUCTOR is defined the test will randomly hang (usually after a few thousand trials).

I am compiling using Apple Clang: Apple LLVM version 9.0.0 (clang-900.0.39.2) Target: x86_64-apple-darwin17.3.0 Thread model: posix C++14 specified, compiled with DEBUG flags set.

EDIT:

To clarify: this question is about the semantics of the specified behavior of instances of condition_variable. The second point above appears to be reenforced in the following quote:

Blockquote Requires: There shall be no thread blocked on *this. [ Note: That is, all threads shall have been notified; they may subsequently block on the lock specified in the wait. This relaxes the usual rules, which would have required all wait calls to happen before destruction. Only the notification to unblock the wait needs to happen before destruction. The user should take care to ensure that no threads wait on *this once the destructor has been started, especially when the waiting threads are calling the wait functions in a loop or using the overloads of wait, wait_­for, or wait_­until that take a predicate. — end note ]

The core semantic question seems to be what "blocked on" means. My present interpretation of the quote above would be that after the line

cv.notify_all(); // defined NOTIFY_IN_DESTRUCTOR 

in ~notify_on_delete() the thread test is not "blocked on" nod - which is to say that I presently understand that after this call "the notification to unblock the wait" has occurred, so according to the quote the requirement has been met to proceed with the destruction of the condition_variable instance.

Can someone provide a clarification of "blocked on" or "notification to unblock" to the effect that in the code above, the call to notify_all() does not satisfy the requirements of ~condition_variable()?

2 Answers

Answers 1

When NOTIFY_IN_DESTRUCTOR is defined:
Calling notify_one()/notify_all() doesn't mean that the waiting thread is immediately woken up and the current thread will wait for the other thread. It just means that if the waiting thread wakes up at some point after the current thread has called notify, it should proceed. So in essence, you might be deleting the condition variable before the waiting thread wakes up (depending on how the threads are scheduled).

The explanation for why it hangs, even if the condition variable is deleted while the other thread is waiting on it lies on the fact the wait/notify operations are implemented using queues associated with the condition variables. These queues hold the threads waiting on the condition variables. Freeing the condition variable would mean getting rid of these thread queues.

Answers 2

I am pretty sure your vendors implementation is broken. Your program looks almost OK from the perspective of obeying the contract with the cv/mutex classes. I couldn’t 100% verify, I am behind one version.

Almost OK, because as Marek R points out, you are relying on referencing a class after its destruction has begun; not the cv/mutex class, your notify_on_delete class. The conflict is a bit academic. I doubt clang would depend upon nod remaining valid after control had transferred to nod->cv.wait(); but the real customer of most compiler vendors are benchmarks, not programmers.

As as general note, multi-threaded programming is difficult, and having now peaked at the c++ threading model, it might be best to give it a decade or two to settle down. It’s contracts are astonishing. When I first looked at your program, I thought ‘duh, there is no way you can destroy a cv that can be accessed because RAII’. Silly me.

Pthreads is another awful API for threading. At least it doesn’t attempt over-reach, and is mature enough that robust test suites keep vendors in line.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment