[prev in list] [next in list] [prev in thread] [next in thread] 

List:       doxygen-users
Subject:    [Doxygen-users] RES: doxygen build failure for Power8 because double free or corruption
From:       "Jorge Ramos (IMAGO)" <jramos () imago ! com ! br>
Date:       2015-11-03 2:48:46
Message-ID: CP1PR80MB0968FD9D6722EC129E58941AEE2B0 () CP1PR80MB0968 ! lamprd80 ! prod ! outlook ! com
[Download RAW message or body]

I don't know the details but here are some thoughts to help:

Beware of m_queue.dequeue because it is not thread-safe, according to QT docs.

About "Since m_bufferNotEmpty has its own mutex internally, it should only allow one \
thread to be awaken", maybe it is not true, because the treads are awaken by wake or \
wakeAll methods. So, If we get two wake calls at the same time, we will have two \
threads ready to run. This situation should be avoided by the locker(&m_mutex) mutex, \
however.

I suggest to use a different mutex to control thread safety of the \
DotRunnerQueue::dequeue function and the bufferNotEmpty condition.

Regards,
Jorge Ramos

-----Mensagem original-----
De: Normand [mailto:normand@linux.vnet.ibm.com] 
Enviada em: segunda-feira, 2 de novembro de 2015 15:15
Para: Adrian M Negreanu <groleo@gmail.com>; Dimitri van Heesch <doxygen@gmail.com>
Cc: doxygen-users@lists.sourceforge.net
Assunto: Re: [Doxygen-users] doxygen build failure for Power8 because double free or \
corruption



On 20/03/2015 23:24, Adrian M Negreanu wrote:
> Given that it's a power8 CPU (SMT), can this be triggered by an 
> instruction reordering somewhere in qwaitcondition_unix.cpp ? Maybe -O0 can help ?

I did a trial with -00 for all (not only qwaitcondition_unix.cpp) and it still \
failed.

> 
> Also valgrind, besides the fact that slows the execution, it also modifies the \
> process instructions. 
> 
> On Thu, Mar 19, 2015 at 11:04 PM, Dimitri van Heesch <doxygen@gmail.com \
> <mailto:doxygen@gmail.com>> wrote: 
> Hi Normand,
> 
> The issues seems to be in this piece of code:
> 
> DotRunner *DotRunnerQueue::dequeue()
> {
> QMutexLocker locker(&m_mutex);
> while (m_queue.isEmpty())
> {
> // wait until something is added to the queue
> m_bufferNotEmpty.wait(&m_mutex);
> }
> DotRunner *result = m_queue.dequeue();
> return result;
> }
> 
> It is one of the few areas that executed by multiple threads,
> but it is protected by a mutex (under the hood the QMutex and QWaitCondition map to \
> pthread calls). Since m_bufferNotEmpty has its own mutex internally, it should only \
> allow one thread to be awaken. What you are seeing, it seems, is two threads doing \
> a dequeue() simultaneously. Would be nice if you could help me with debugging this \
> issue.

I do not know how to debug this, adding printf do not help me to isolate a problem.
Any suggestions ?
===
     // wait until something is added to the queue
     m_bufferNotEmpty.wait(&m_mutex);
   }
+  pthread_t id = pthread_self();
+  printf("%08x: %p: DotRunnerQueue::dequeue, m_queue %p\n",id, this, 
+ m_queue);
   DotRunner *result = m_queue.dequeue();
   return result;
 }
===
...
ad0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ac0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468 Running dot \
                for graph 309/1586 Running dot for graph 310/1586
ac8df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468 Running dot \
                for graph 311/1586
ab0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ab8df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ad0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468 Running dot \
                for graph 312/1586 Running dot for graph 313/1586 Running dot for \
                graph 314/1586
ac0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468 Running dot \
                for graph 315/1586
ab0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ac0df190: 0x1002f8d0460: DotRunnerQueue::dequeue, m_queue 0x1002f8d0468
ac8df190: 0x1002f8d0460: DotRunner
===

> 
> A workaround is to set DOT_NUM_THREADS to 1.

The workaround is what is implemented today for openSUSE tumbleweed for ppc64le.

> 
> Regards,
> Dimitri
> 
> > On 18 Mar 2015, at 12:45 , Normand <normand@linux.vnet.ibm.com \
> > <mailto:normand@linux.vnet.ibm.com>> wrote: 
> > 
> > On 11/03/2015 14:16, Normand wrote:
> > > Hi there
> > > 
> > > while building doxygen for opensuse on Power8 guest I hit a failure as detailed \
> > > in (2) The related backtrace extracted for core file is appended below in (1)
> > > 
> > > 
> > > === (1)
> > > Core was generated by `./bin/doxygen '.
> > > Program terminated with signal SIGABRT, Aborted.
> > > #0  0x00003fffa5acd194 in __GI_raise (sig=<optimized out>) at \
> > > ../sysdeps/unix/sysv/linux/raise.c:55 55      \
> > > ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. Missing separate \
> > > debuginfos, use: zypper install libgcc_s1-debuginfo-4.8.3+r218481-2.1.ppc64le \
> > > libstdc++6-debuginfo-4.8.3+r218481-2.1.ppc64le (gdb) bt
> > > #0  0x00003fffa5acd194 in __GI_raise (sig=<optimized out>) at \
> > > ../sysdeps/unix/sysv/linux/raise.c:55 #1  0x00003fffa5acf184 in __GI_abort () \
> > > at abort.c:78 #2  0x00003fffa5b136c4 in __libc_message (do_abort=<optimized \
> > > out>, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175 #3  \
> > > 0x00003fffa5b1ba84 in malloc_printerr (action=<optimized out>, \
> > > str=0x3fffa5c06b50 "double free or corruption (fasttop)", ptr=<optimized out>) \
> > > at malloc.c:4960 #4  0x00003fffa5b1cadc in _int_free (av=<optimized out>, \
> > > p=<optimized out>, have_lock=<optimized out>) at malloc.c:3831 #5  \
> > > 0x00003fffa5dece10 in operator delete(void*) () from /usr/lib64/libstdc++.so.6 \
> > > #6  0x00000000106620e4 in QGList::takeFirst (this=<optimized out>) at \
> > > qglist.cpp:628 #7  0x000000001053ba84 in dequeue (this=<optimized out>) at \
> > > ../qtools/qqueue.h:59 #8  DotRunnerQueue::dequeue (this=0x1001910fcc0) at \
> > > dot.cpp:1170 #9  0x000000001053bb18 in DotWorkerThread::run \
> > > (this=0x10019112a50) at dot.cpp:1191 #10 0x00000000106a0a44 in \
> > > QThreadPrivate::start (arg=0x10019112a50) at qthread_unix.cpp:87 #11 \
> > > 0x00003fffa5ee9454 in start_thread (arg=0x3fffa38bf180) at pthread_create.c:335 \
> > > #12 0x00003fffa5b9e0c4 in clone () at \
> > > ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96 ===
> > > 
> > > (2) https://bugzilla.suse.com/show_bug.cgi?id=921577
> > > 
> > 
> > 
> > 
> > I was able to recreate the problem with doxygen last git commit 1c8bbb6
> > *   1c8bbb6 (HEAD, origin/master, origin/HEAD, master) Merge pull request #314
> > 
> > The associated backtrace (from core file) only differ from above by some line \
> > numbers But is still pointing to same call sequence:
> > from DotWorkerThread::run
> > to delete in QCollection::Item QGList::takeFirst
> > ===
> > #0  0x00003fff8433d194 in raise () from /lib64/libc.so.6
> > Missing separate debuginfos, use: zypper install glibc-debuginfo-2.21-3.3.ppc64le \
> > libgcc_s1-debuginfo-4.8.3+r218481-4.3.ppc64le \
> > libstdc++6-debuginfo-4.8.3+r218481-4.3.ppc64le (gdb) bt
> > #0  0x00003fff8433d194 in raise () from /lib64/libc.so.6
> > #1  0x00003fff8433f184 in abort () from /lib64/libc.so.6
> > #2  0x00003fff843836c4 in __libc_message () from /lib64/libc.so.6
> > #3  0x00003fff8438ba84 in malloc_printerr () from /lib64/libc.so.6
> > #4  0x00003fff8438cadc in _int_free () from /lib64/libc.so.6
> > #5  0x00003fff8465ce10 in operator delete(void*) () from \
> > /usr/lib64/libstdc++.so.6 #6  0x000000001066c5e4 in QGList::takeFirst \
> > (this=<optimized out>) at qglist.cpp:628 #7  0x0000000010544e04 in dequeue \
> > (this=<optimized out>) at ../qtools/qqueue.h:59 #8  DotRunnerQueue::dequeue \
> > (this=0x1001707f7b0) at dot.cpp:1181 #9  0x0000000010544e98 in \
> > DotWorkerThread::run (this=0x1001707efd0) at dot.cpp:1202 #10 0x00000000106aaf44 \
> > in QThreadPrivate::start (arg=0x1001707efd0) at qthread_unix.cpp:87 #11 \
> > 0x00003fff84759454 in start_thread () from /lib64/libpthread.so.0 #12 \
> > 0x00003fff8440e0c4 in clone () from /lib64/libc.so.6 ===
> > 
> > The occurence is timing dependent, and there is no failure if trying to start \
> > doxygen via gdb or valgrind, so I do not know how to continue investigation.
> > 
> > any suggestions are welcome.
> > 
> > ---
> > Michel Normand
> > 


--
Michel Normand


------------------------------------------------------------------------------
_______________________________________________
Doxygen-users mailing list
Doxygen-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/doxygen-users

------------------------------------------------------------------------------
_______________________________________________
Doxygen-users mailing list
Doxygen-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/doxygen-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic