'Re: [Boost-users] thread_group::interrupt_all is not reliable'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       boost-users
Subject:    Re: [Boost-users] thread_group::interrupt_all is not reliable
From:       Stonewall Ballard <sb.list () sb ! org>
Date:       2009-11-30 18:36:02
Message-ID: 16D169DE-EE3F-4B1F-A304-BDBA359A8451 () sb ! org
[Download RAW message or body]

I think I found the cause of this problem. It seems that the caller of interrupt_all \
should be holding the mutex associated with the condition on which the threads are \
waiting.

This gave me the clue to try that:
<http://www.opengroup.org/onlinepubs/009695399/functions/pthread_cond_broadcast.html>
> The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a \
> thread whether or not it currently owns the mutex that threads calling \
> pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition \
> variable during their waits; however, if predictable scheduling behavior is \
> required, then that mutex shall be locked by the thread calling \
> pthread_cond_broadcast() or pthread_cond_signal().

thread::interrupt() calls pthread_cond_broadcast in pthread/thread.cpp.

Although "predictable scheduling" doesn't seem like it should include a failure to \
wake up, taking the mutex around the call to thread_pool::interrupt_all() appears to \
be 100% reliable.

I can patch my app to do that, but I don't think there's a general solution. The \
documentation should include a note that thread::interrupt() isn't reliable unless \
the caller is holding the mutex associated with the condition variable on which the \
interrupted thread is waiting.

Of course, this could be a bug in the OS X pthreads implementation as well.

 - Stoney

> I've discovered that under circumstances apparently related to timing 
> and load, sending interrupt_all to a thread_group when all the threads 
> are waiting on a boost::condition_variable leaves one thread waiting 
> about 1/3 of the time. This is with boost 1_40_0 running on Mac OS X 
> 10.6.2, with 32-bit boost libraries. Boost uses the posix thread 
> system here. 
> I boiled my app down to some test code that runs as a command-line 
> app. It's a bit longer than I'd like, but this configuration seems to 
> be necessary to invoke the problem. The test uses a queue to pass 
> "tasks" from the main thread to worker threads, and another queue to 
> pass "results" back to the main thread. The problem is most apparent 
> when all the tasks are finished and the queue empties, so that all the 
> worker threads are waiting on the input queue when the main thread 
> sends interrupt_all. 
> 
> I've looked at the waiting thread in a debugger when this happens, and 
> found that it has been interrupted, but is still waiting on the 
> condition. It looks like it just got missed by the interrupt_all. This 
> is more likely to happen when there are a lot of worker threads (16, 
> or one per core in my testing). 
> 
> The test code is parked at <http://sb.org/ThreadTest.zip>, 20KB. It's 
> an XCode 3.2 project, but the five source files could be readily 
> compiled and run in any Unix environment. 
> 
> I don't see any errors in the code that could cause these failures. 
> There is a work-around, which is to interrupt the waiting thread 
> again. This required a modified version of thread_group so I could do 
> a timed_join_all on it. 
> 
> I welcome any suggestions about what could be wrong here, or ways to 
> simplify the test to make it more suitable for a bug report. 
> 
> - Stoney 
-- 
Stonewall Ballard 
stoney@sb.org           http://stoney.sb.org/

_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

[prev in list] [next in list] [prev in thread] [next in thread]