Search This Blog

Saturday, June 11, 2005

Conditions - Description

Conditions are a lot like events - they indicate that some condition is or isn't true, and they may be waited on by threads that have nothing to do until the condition becomes true. However, unlike an event, they are integrally paired with a mutex that protects some data. A thread waits on the condition from inside the mutex, and when it gets woken, it will still be inside the mutex. Yet the mutex does not remain locked while the thread sleeps, so other threads can enter the mutex, modify the data, and signal that the condition has become true.

So, why do we implement both? Events are a Windows thing, as Windows doesn't support conditions. But POSIX doesn't support events - it only supports conditions. This seemed like a curious design decision to me when I first was learning about the POSIX API (as I've always programmed on Windows, and events were all that I knew), but I better understand the reasons, now.

While there are times where a simple "fire and forget" event may be useful, usually an event is associated with the state of some data. For example, one of my programs was an audio-streaming library. This system used three worker threads for its tasks (the rationale for the exact number is rather lengthy, so I won't cover it here) - a streaming (decoding) thread, a prebuffering thread, and a prefetching thread. The job of the prebuffering thread is to prebuffer new streams before they begin playing; the rest of the time, it sleeps, while it waits for new requests. When a request arrives, the condition gets flagged, and the thread goes to work. When this happens, the thread must lock the mutex protecting the list of streams needing prebuffering (as it's accessed by more than 1 thread), dequeue one stream, leave the mutex, then do its prebuffering outside the mutex. When it's done prebuffering, it adds the stream to the playing streams list (inside the mutex, of course), tells the OS that the stream has data to play, and checks if there are any more streams needing prebuffering; if no more streams need prebuffering, the thread goes back to waiting on the condition.

The point is that signaling of the condition will always be followed by the locking of the mutex. This is where condition variables come in. By combining the locking/unlocking with the wait on the condition, this leaves the OS free to do any desired optimizations, which could further improve performance. One example of a potential optimization would be a direct context switch of the CPU from the thread signaling the condition to the thread waiting on the condition, with 0 latency. From what I've heard, Windows internally supports something like this, called event pairs (my information is second-hand, as I know very little about how the Windows kernel works internally, apart from what I've heard from more knowledgeable friends and my Inside Windows NT book). In any case, the goal of LibQ is to provide a platform-independent set of interfaces, which take advantage of features that are optimized on each platform. Because the condition/mutex model is so common, that means we have to support it.

As a footnote, I should mention a couple of sticky points about conditions. Unlike events, conditions do not retain their value. If a thread signals a condition, absolutely nothing happens if there are no threads waiting on that condition (unlike for events, where the event will remain set until reset). The reason for this is that threads are expected to check the data associated with a condition to determine whether the condition is already true (since they'll already be inside the mutex). Thus, the data itself indicates whether the condition is true or false.

Also, according to the POSIX standard, conditions are not guaranteed to be watertight. That is, a thread waiting on a condition may get woken when the condition is not actually true. This can happen for several reasons; for example, a thread may be waiting on the mutex before a condition becomes true, and so will get the mutex before the woken thread gets to it, changing the value of the data so that the condition is no longer true. Beware of this - when Wait returns, always check that the condition is indeed true; if not, call Wait again.

No comments: