[prev in list] [next in list] [prev in thread] [next in thread] 

List:       boost-users
Subject:    Re: [Boost-users] Thread local storage
From:       Oliver Abert <abert () uni-koblenz ! de>
Date:       2009-03-30 14:12:11
Message-ID: 4A03F6F8-66F8-43F1-9A9D-B5B32D70CB70 () uni-koblenz ! de
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On 30.03.2009, at 14:08, Anthony Williams wrote:

> Oliver Abert <abert@uni-koblenz.de> writes:
>
>>> Thanks for alerting me to this thread Peter.
>>>
>>> Oliver Abert <abert@uni-koblenz.de> writes:
>>>
>>>> On 29.03.2009, at 19:36, Peter Dimov wrote:
>>>>
>>>>> Oliver Abert:
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I am using Boost Threads (1.38) as threading library and I also  
>>>>>> use
>>>>>> the thread_specific_ptr to store a minor amount of data per   
>>>>>> thread
>>>>>> (I  think currently it is like 5 different pointer values  per
>>>>>> thread).  Technically everything works out fine, but I am   
>>>>>> having a
>>>>>> performance  problem on Mac OS X. On Linux the performance  is 10
>>>>>> times faster than  on Mac OS. If I use pthreads on Mac OS I  have
>>>>>> identical performance to  the Linux version. Both versions are
>>>>>> running on the same machine using  8 threads both.
>>>>>
>>>>> What does your profiler say?
>>>>
>>>> about 80% of the time is spend in __spin_lock which in turnwas  
>>>> called
>>>> by pthread_once. If I use only one thread (instead of 8) the
>>>> percantage goes down to 2.5% - which is still a bit much for my
>>>> taste.
>>>
>>> pthread_once is called by the thread_specific_ptr code to ensure  
>>> that
>>> the TLS key it uses has been allocated and is valid. It's a real
>>> pain if
>>> that is too slow.
>>
>> yes, i understand that so far - but there seems to be some more
>> serious problem. Is it possible that there is some unintended mutex
>> lock, because it seems like exactly that is happening. Maybe it is
>> related to the static variables, which might get mutexed
>> automatically? I heard there is a bug with the Apple gcc 4.0.1
>> regarding statics, but this morning I also tried the intel 11.0
>> compiler with the same dissapointing results. What makes me wonder,
>> ist that the same code runs just fine on Linux.
>>
>> Some more background Information: The problem is definitevly caused  
>> by
>> calls to get() of the shared pointer. I am using it in a realtively
>> hot section of my code. Profiling is not so helpful, because there  
>> are
>> a bunch of unknown libraries in between my call and the pthread_once
>> call - and yes I also used a begug build of boost - I have not a clue
>> what is happening in between.
>
> Could you show the code that accesses the thread_specific_ptr?

Okay, the calling is done by a simple:

HierarchyTraverser *ht = RenderThread::hierarchyTraverser();

(there is nothing boost related stuff before and after that call)
while that is:

inline HierarchyTraverser* RenderThread::hierarchyTraverser()
{
#ifdef BOOST
	return  
reinterpret_cast<HierarchyTraverser*>(mHierarchyTraverser.get());
#else
	return  
reinterpret_cast 
<HierarchyTraverser*>(pthread_getspecific(mHierarchyTraverser));
#endif
}

and the mHierarchyTraverser is of type
  static boost::thread_specific_ptr<unsigned long int>		 
mHierarchyTraverser;

Hope that helps, but as you can see its basically pretty unspectacular.

Oliver

>
> Anthony
> -- 
> Author of C++ Concurrency in Action | http://www.manning.com/williams
> just::thread C++0x thread library   | http://www.stdthread.co.uk
> Just Software Solutions Ltd         | http://www.justsoftwaresolutions.co.uk
> 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No.  
> 5478976
>
> _______________________________________________
> Boost-users mailing list
> Boost-users@lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


[Attachment #5 (text/html)]

<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; \
-webkit-line-break: after-white-space; "><br><div><div>On 30.03.2009, at 14:08, \
Anthony Williams wrote:</div><br class="Apple-interchange-newline"><blockquote \
type="cite"><div>Oliver Abert &lt;<a \
href="mailto:abert@uni-koblenz.de">abert@uni-koblenz.de</a>&gt; \
writes:<br><br><blockquote type="cite"><blockquote type="cite">Thanks for alerting me \
to this thread Peter.<br></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite">Oliver Abert &lt;<a \
href="mailto:abert@uni-koblenz.de">abert@uni-koblenz.de</a>&gt; \
writes:<br></blockquote></blockquote><blockquote type="cite"><blockquote \
type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">On 29.03.2009, at 19:36, Peter Dimov \
wrote:<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote \
type="cite"><blockquote \
type="cite"><br></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite">Oliver Abert:<br></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">Hi \
Everyone,<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote \
type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">I am using Boost Threads (1.38) as threading \
library and I also use<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">the thread_specific_ptr to store a minor amount \
of data per &nbsp;thread<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">(I &nbsp;think currently it is like 5 different \
pointer values &nbsp;per<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">thread). &nbsp;Technically everything works out \
fine, but I am &nbsp;having \
a<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">performance &nbsp;problem on Mac OS X. On Linux \
the performance &nbsp;is \
10<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">times faster than &nbsp;on Mac OS. If I use \
pthreads on Mac OS I \
&nbsp;have<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">identical performance to &nbsp;the Linux version. \
Both versions are<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">running on the same machine using &nbsp;8 threads \
both.<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote \
type="cite">What does your profiler \
say?<br></blockquote></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote \
type="cite"><br></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite">about 80% of the time is \
spend in __spin_lock which in turnwas \
called<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote \
type="cite"><blockquote type="cite">by pthread_once. If I use only one thread \
(instead of 8) the<br></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><blockquote type="cite">percantage goes down to \
2.5% - which is still a bit much for \
my<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote \
type="cite"><blockquote \
type="cite">taste.<br></blockquote></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote \
type="cite"><blockquote type="cite">pthread_once is called by the thread_specific_ptr \
code to ensure that<br></blockquote></blockquote><blockquote type="cite"><blockquote \
type="cite">the TLS key it uses has been allocated and is valid. It's a \
real<br></blockquote></blockquote><blockquote type="cite"><blockquote \
type="cite">pain if<br></blockquote></blockquote><blockquote type="cite"><blockquote \
type="cite">that is too slow.<br></blockquote></blockquote><blockquote \
type="cite"><br></blockquote><blockquote type="cite">yes, i understand that so far - \
but there seems to be some more<br></blockquote><blockquote type="cite">serious \
problem. Is it possible that there is some unintended \
mutex<br></blockquote><blockquote type="cite">lock, because it seems like exactly \
that is happening. Maybe it is<br></blockquote><blockquote type="cite">related to the \
static variables, which might get mutexed<br></blockquote><blockquote \
type="cite">automatically? I heard there is a bug with the Apple gcc \
4.0.1<br></blockquote><blockquote type="cite">regarding statics, but this morning I \
also tried the intel 11.0<br></blockquote><blockquote type="cite">compiler with the \
same dissapointing results. What makes me wonder,<br></blockquote><blockquote \
type="cite">ist that the same code runs just fine on \
Linux.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote \
type="cite">Some more background Information: The problem is definitevly caused \
by<br></blockquote><blockquote type="cite">calls to get() of the shared pointer. I am \
using it in a realtively<br></blockquote><blockquote type="cite">hot section of my \
code. Profiling is not so helpful, because there are<br></blockquote><blockquote \
type="cite">a bunch of unknown libraries in between my call and the \
pthread_once<br></blockquote><blockquote type="cite">call - and yes I also used a \
begug build of boost - I have not a clue<br></blockquote><blockquote type="cite">what \
is happening in between.<br></blockquote><br>Could you show the code that accesses \
the thread_specific_ptr?<br></div></blockquote><div><br></div><div>Okay, the calling \
is done by a simple:</div><div><br></div><div><div style="margin-top: 0px; \
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal \
11px/normal Monaco; color: rgb(63, 110, 116); ">HierarchyTraverser<span style="color: \
#000000"> *ht = </span>RenderThread<span style="color: #000000">::</span><span \
style="color: #26474b">hierarchyTraverser</span><span style="color: \
#000000">();</span></div></div><div><br></div><div>(there is nothing boost related \
stuff before and after that call)</div>while that is:</div><div><br></div><div><div \
style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; \
font: normal normal normal 11px/normal Monaco; color: rgb(63, 110, 116); "><span \
style="color: #aa0d91">inline</span><span style="color: #000000"> \
</span>HierarchyTraverser<span style="color: #000000">* </span>RenderThread<span \
style="color: #000000">::hierarchyTraverser()</span></div><div style="margin-top: \
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal \
normal 11px/normal Monaco; ">{</div><div style="margin-top: 0px; margin-right: 0px; \
margin-bottom: 0px; margin-left: 0px; font: normal normal normal 11px/normal Monaco; \
color: rgb(100, 56, 32); ">#ifdef BOOST</div><div style="margin-top: 0px; \
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal \
11px/normal Monaco; color: rgb(63, 110, 116); "><span style="color: #000000"><span \
class="Apple-tab-span" style="white-space:pre">	</span></span><span style="color: \
#aa0d91">return</span><span style="color: #000000"> </span><span style="color: \
#aa0d91">reinterpret_cast</span><span style="color: \
#000000">&lt;</span>HierarchyTraverser<span style="color: \
#000000">*&gt;(</span>mHierarchyTraverser<span style="color: #000000">.</span><span \
style="color: #2e0d6e">get</span><span style="color: #000000">());</span></div><div \
style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; \
font: normal normal normal 11px/normal Monaco; color: rgb(100, 56, 32); \
">#else</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; \
margin-left: 0px; font: normal normal normal 11px/normal Monaco; color: rgb(63, 110, \
116); "><span style="color: #000000"><span class="Apple-tab-span" \
style="white-space:pre">	</span></span><span style="color: \
#aa0d91">return</span><span style="color: #000000"> </span><span style="color: \
#aa0d91">reinterpret_cast</span><span style="color: \
#000000">&lt;</span>HierarchyTraverser<span style="color: #000000">*&gt;(</span><span \
style="color: #2e0d6e">pthread_getspecific</span><span style="color: \
#000000">(</span>mHierarchyTraverser<span style="color: #000000">));</span></div><div \
style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; \
font: normal normal normal 11px/normal Monaco; color: rgb(100, 56, 32); \
">#endif</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; \
margin-left: 0px; font: normal normal normal 11px/normal Monaco; \
">}</div></div><div><br></div><div>and the mHierarchyTraverser is of \
type</div><div>&nbsp;<span class="Apple-style-span" style="font-family: Monaco; \
font-size: 11px; "><span style="color: #aa0d91">static</span> <span style="color: \
#5c2699">boost</span>::thread_specific_ptr&lt;<span style="color: \
#aa0d91">unsigned</span> <span style="color: #aa0d91">long</span> <span style="color: \
#aa0d91">int</span>&gt;<span class="Apple-tab-span" \
style="white-space:pre">		</span><span style="color: \
#3f6e74">mHierarchyTraverser</span>;</span></div><div><br></div><div>Hope that helps, \
but as you can see its basically pretty \
unspectacular.&nbsp;</div><div><br></div><div>Oliver</div><div><br><blockquote \
type="cite"><div><br>Anthony<br>-- <br>Author of C++ Concurrency in Action | <a \
href="http://www.manning.com/williams">http://www.manning.com/williams</a><br>just::thread \
C++0x thread library &nbsp;&nbsp;| <a \
href="http://www.stdthread.co.uk">http://www.stdthread.co.uk</a><br>Just Software \
Solutions Ltd &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| <a \
href="http://www.justsoftwaresolutions.co.uk">http://www.justsoftwaresolutions.co.uk</a><br>15 \
Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. \
5478976<br><br>_______________________________________________<br>Boost-users mailing \
list<br><a href="mailto:Boost-users@lists.boost.org">Boost-users@lists.boost.org</a><b \
r>http://lists.boost.org/mailman/listinfo.cgi/boost-users<br></div></blockquote></div><br></body></html>




_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic