[prev in list] [next in list] [prev in thread] [next in thread] 

List:       glibc-help
Subject:    Need help to interpret pthread_mutex_t structure as printed by gdb
From:       doark () mail ! com (David Niklas)
Date:       2016-09-06 4:14:00
Message-ID: 20160903122157.21fbd5f9 () ulgy_thing
[Download RAW message or body]

On Wed, 17 Aug 2016 16:49:52 <ollemblomgren@gmail.com> wrote:

Forgive the top post.
I was wondering if this problem was ever solved (it's been ~2 weeks), it
interested me, though I'm just as baffled as anyone else.

Thanks,
David

> Source code and instructions have been sent. Thanks!
> 
> On Wed, Aug 17, 2016 at 1:16 AM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
> > Sure, I can take a look! Just send the instruction on how to setup the
> > environment.
> > 
> > On 16/08/2016 19:53, Olle Blomgren wrote:  
> > > I have test case that fails 9 out of 10 times on my machine. Do I
> > > send the code and instructions to you on this mail address? The code
> > > will be GPL:ed before release but I don't want it to float around on
> > > he Internet just yet. I want to send it outside this conversation if
> > > that's ok?
> > > 
> > > On Tue, Aug 16, 2016 at 10:44 PM, Adhemerval Zanella
> > > <adhemerval.zanella@linaro.org> wrote:  
> > > > For thread race conditions checks I usually ran with Thread
> > > > Sanitizer (both gcc and clang uses it).
> > > > 
> > > > But again, bug reports for runtime issues in default usage modes are
> > > > usually debugged with *reproducible* testcase. It could be either a
> > > > snippet that stresses the issue or a more complex scenario on how
> > > > to trigger the issue.
> > > > 
> > > > What it is somewhat frustrating is report with partial debug
> > > > information where one must constantly assume or guess scenarios
> > > > about the origin environment.
> > > > 
> > > > So if you could provide a way to reproduce this issue it would be
> > > > *really* helpful to check what it is really happening.
> > > > 
> > > > On 16/08/2016 16:53, Olle Blomgren wrote:  
> > > > > I have now run valgrind tools memcheck, helgrind and drd on both
> > > > > client and server. The latter two are intended for MT programs
> > > > > using a pthread implementation. When I run the tools on the client
> > > > > side, the hang never occurs. When I run the tools on the server
> > > > > the hang occurs as usual. Although the hang occurs there is
> > > > > nothing in the logs that suggests an error that would cause the
> > > > > hang.
> > > > > 
> > > > > When I use the helgrind tool it is possible to attach to the
> > > > > valgrind GDB server with a GDB client and query all the known
> > > > > mutexes and ask to see if any are held. This query reports 0
> > > > > locked mutexes although the client hangs on a mutex it believes is
> > > > > locked, as I pass the query (monitor info locks).
> > > > > 
> > > > > I have also run the Linux GRUB memtest86+ to ensure my RAM is
> > > > > physically ok, and it is.
> > > > > 
> > > > > Do you have any debug tools or any other hints on what I can try to
> > > > > crack this on my own?  If not, I would really appreciate sending my
> > > > > code to build and run on an x86_64 machine at the hands of someone
> > > > > more knowledgeable than myself.
> > > > > 
> > > > > 
> > > > > On Mon, Aug 15, 2016 at 11:45 PM, Adhemerval Zanella
> > > > > <adhemerval.zanella@linaro.org> wrote:  
> > > > > > 
> > > > > > 
> > > > > > On 15/08/2016 18:14, Olle Blomgren wrote:  
> > > > > > > > Could you provide the glibc version, a complete backtrace of
> > > > > > > > both processes and if possible a testcase?  
> > > > > > > 
> > > > > > > I use glibc-2.23.
> > > > > > > I have downloaded the source and built it with debug symbols.
> > > > > > > Full back traces follows below.
> > > > > > > Client back trace + printout of offending mutex. (This is the
> > > > > > > one that hangs)
> > > > > > > -------------------------------------------------------------------------------------------------------------------
> > > > > > >  Thread 3 (Thread 0x7ff8d1fcc700 (LWP 14005)): #0
> > > > > > > 0x00007ff8d231b82d in poll ()
> > > > > > > at ../sysdeps/unix/syscall-template.S:84 #1  0x00007ff8d2b39434
> > > > > > > in _daemon_monitor (attr=0x7ff8d2da8300 <_mon_thr_attr>) at
> > > > > > > mam.c:1346 #2  0x00007ff8d2db03e4 in start_thread
> > > > > > > (arg=0x7ff8d1fcc700) at pthread_create.c:333 #3
> > > > > > > 0x00007ff8d23248ad in clone ()
> > > > > > > at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > > > > 
> > > > > > > Thread 2 (Thread 0x7ff8d15cb700 (LWP 14006)):
> > > > > > > #0  __lll_lock_wait ()
> > > > > > > at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1
> > > > > > > 0x00007ff8d2db2a75 in __GI___pthread_mutex_lock
> > > > > > > (mutex=0x7ff8d16ac810) at ../nptl/pthread_mutex_lock.c:80 #2
> > > > > > > 0x00007ff8d2b1f962 in lch_lock (name=919552, type=LCH_EXCL,
> > > > > > > tx=0x7ff8d3446080, ctx=0x7ff8c4001110) at lch.c:124
> > > > > > > #3  0x00007ff8d2b40c1c in _get_page_get_entry
> > > > > > > (entry=0x7ff8d15cab80, nbr_bytes=8, ctx=0x7ff8c4001110)
> > > > > > > at pge.c:1194
> > > > > > > #4  0x00007ff8d2b41b36 in pge_alloc_entry (entry=0x7ff8d15cab80,
> > > > > > > nbr_bytes=8, page_sz=1 '\001',
> > > > > > > ctx=0x7ff8c4001110) at pge.c:1416
> > > > > > > #5  0x00007ff8d2b5d0bf in api_int_setup_entry
> > > > > > > (entry_arr=0x7ff8d15cacb8, page_ptr=0x7ff8d15cac88,
> > > > > > > page_ref=0x7ff8d15cac80, lr=0x7ff8d15cad50, meta=0xf68d50,
> > > > > > > tx=0x7ff8d3446080, nbr_entries=1, sct=0,
> > > > > > > ctx=0x7ff8c4001110) at api_int.c:47
> > > > > > > #6  0x00007ff8d2b5b410 in hope_new (tx=0x7ff8c4000cd0,
> > > > > > > elm_type=0x40b1c9 "NoColl", nv_list=0x40adf9 "tjing")
> > > > > > > at hope.c:2495
> > > > > > > #7  0x0000000000405d6d in _new1 (attr=0x0) at api_tests.c:1579
> > > > > > > #8  0x00007ff8d2db03e4 in start_thread (arg=0x7ff8d15cb700) at
> > > > > > > pthread_create.c:333
> > > > > > > #9  0x00007ff8d23248ad in clone () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > > > > 
> > > > > > > Thread 1 (Thread 0x7ff8d34e0700 (LWP 14004)):
> > > > > > > #0  0x00007ff8d2db164d in pthread_join (threadid=140706641131264,
> > > > > > > thread_return=0x0) at pthread_join.c:90
> > > > > > > #1  0x0000000000406252 in
> > > > > > > _scan_1new40000_1read_1lockmod5remove_3thr (tc=30,
> > > > > > > print_desc_only=0) at api_tests.c:1682
> > > > > > > #2  0x000000000040a1b5 in _all_test_cases (tc=30, nbr_updates=0,
> > > > > > > start_value=0, name=0x0,
> > > > > > > ref=18446744073709551615, print_desc_only=0) at
> > > > > > > api_tests.c:2981 #3  0x000000000040a784 in main (argc=4,
> > > > > > > argv=0x7fffed178988) at api_tests.c:3084 (gdb) t 2
> > > > > > > [Switching to thread 2 (Thread 0x7ff8d15cb700 (LWP 14006))]
> > > > > > > #1  0x00007ff8d2db2a75 in __GI___pthread_mutex_lock
> > > > > > > (mutex=0x7ff8d16ac810) at ../nptl/pthread_mutex_lock.c:80
> > > > > > > 80          LLL_MUTEX_LOCK (mutex);
> > > > > > > (gdb) f 1
> > > > > > > #1  0x00007ff8d2db2a75 in __GI___pthread_mutex_lock
> > > > > > > (mutex=0x7ff8d16ac810) at ../nptl/pthread_mutex_lock.c:80
> > > > > > > 80          LLL_MUTEX_LOCK (mutex);
> > > > > > > (gdb) p *mutex
> > > > > > > $2 = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers =
> > > > > > > 0, __kind = 0, __spins = 0, __elision = 0,
> > > > > > > __list = {__prev = 0x0, __next = 0x0}}, __size = '\000'
> > > > > > > <repeats 39 times>, __align = 0}
> > > > > > > 
> > > > > > > 
> > > > > > > Server back trace (All threads are just waiting around.)
> > > > > > > ------------------------------------------------------------------------
> > > > > > >  Thread 6 (Thread 0x7f0686b45700 (LWP 14094)):
> > > > > > > #0  pthread_cond_wait@@GLIBC_2.3.2 () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> > > > > > > #1  0x00007f0687838d8b in migrator (attr=0x7f0687c4f758) at
> > > > > > > hsh_int.c:207 #2  0x00007f06875fb3e4 in start_thread
> > > > > > > (arg=0x7f0686b45700) at pthread_create.c:333
> > > > > > > #3  0x00007f0686e2d8ad in clone () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > > > > 
> > > > > > > Thread 5 (Thread 0x7f0684b41700 (LWP 14098)):
> > > > > > > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> > > > > > > #1  0x00007f0687896555 in _do_checkpoint (attr=0x0) at ckp.c:1411
> > > > > > > #2  0x00007f06875fb3e4 in start_thread (arg=0x7f0684b41700) at
> > > > > > > pthread_create.c:333
> > > > > > > #3  0x00007f0686e2d8ad in clone () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > > > > 
> > > > > > > Thread 4 (Thread 0x7f0685342700 (LWP 14097)):
> > > > > > > #0  pthread_cond_wait@@GLIBC_2.3.2 () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> > > > > > > #1  0x00007f0687838d8b in migrator (attr=0x7f0687c51280) at
> > > > > > > hsh_int.c:207 #2  0x00007f06875fb3e4 in start_thread
> > > > > > > (arg=0x7f0685342700) at pthread_create.c:333
> > > > > > > #3  0x00007f0686e2d8ad in clone () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > > > > 
> > > > > > > Thread 3 (Thread 0x7f0685b43700 (LWP 14096)):
> > > > > > > #0  pthread_cond_wait@@GLIBC_2.3.2 () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> > > > > > > #1  0x00007f06878541bc in _rm_log (attr=0x0) at tlg.c:1527
> > > > > > > #2  0x00007f06875fb3e4 in start_thread (arg=0x7f0685b43700) at
> > > > > > > pthread_create.c:333
> > > > > > > #3  0x00007f0686e2d8ad in clone () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > > > > 
> > > > > > > Thread 2 (Thread 0x7f0686344700 (LWP 14095)):
> > > > > > > #0  pthread_cond_wait@@GLIBC_2.3.2 () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> > > > > > > #1  0x00007f0687853110 in flush_log_buffer (attr=0x0) at
> > > > > > > tlg.c:1385 #2  0x00007f06875fb3e4 in start_thread
> > > > > > > (arg=0x7f0686344700) at pthread_create.c:333
> > > > > > > #3  0x00007f0686e2d8ad in clone () at
> > > > > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > > > > 
> > > > > > > Thread 1 (Thread 0x7f0687cef700 (LWP 14093)):
> > > > > > > #0  0x00007f0686e264f3 in select ()
> > > > > > > at ../sysdeps/unix/syscall-template.S:84 #1  0x00007f068785dcf0
> > > > > > > in _daemon_server_loop (attr=0x7ffff2d255d0) at mam.c:730 #2
> > > > > > > 0x00007f068785e1bc in mam_daemon_server_run (ctx=0x7ffff2d255d0)
> > > > > > > at mam.c:860 #3  0x00007f0687870053 in dae_init
> > > > > > > (instance=0x212d180 "/tmp/func_api", force=1, ctx=0x7ffff2d255d0)
> > > > > > > at dae.c:297
> > > > > > > #4  0x00007f068785bf4a in _do_start_daemon (ctx=0x7ffff2d255d0)
> > > > > > > at clu.c:320 #5  0x00007f068785c388 in clu_execute
> > > > > > > (ctx=0x7ffff2d255d0) at clu.c:449 #6  0x0000000000400994 in main
> > > > > > > (argc=6, argv=0x7ffff2d25b58) at db.c:21
> > > > > > > 
> > > > > > > I can see that it is always server thread #5 that interferes
> > > > > > > with the hanging client thread in the following way:
> > > > > > > Server thread acquires mutex X.
> > > > > > > Client thread acquires mutex X (it sits waiting for the server
> > > > > > > thread to release mutex X).
> > > > > > > Server thread releases mutex X and continues its work...
> > > > > > > Client thread never wakes up. It continues to hang on mutex X.
> > > > > > > 
> > > > > > > I have written a test program to try to provoke the same access
> > > > > > > pattern but I cannot make the same thing happen.
> > > > > > > I can see that the same access pattern:
> > > > > > > 1) server acquires mutex X
> > > > > > > 2) client acquires mutex X
> > > > > > > 3) server releases mutex X
> > > > > > > 4) client acquires mutex X
> > > > > > > completes successfully numerous times before the client hangs as
> > > > > > > shown above.
> > > > > > > 
> > > > > > > Can you give me hints on how to continue further or do you want
> > > > > > > me to send my code to you?
> > > > > > > As seen above the hardware is x86_64.
> > > > > > > 
> > > > > > > Brs
> > > > > > > /Olle  
> > > > > > 
> > > > > > I think either a testcase or a way to reproduce this issue would
> > > > > > be the way to proceed since current log is still inconclusive.  I
> > > > > > tried to check on sourceware bug database to check some fixed
> > > > > > issue since 2.19, but you seemed to already checked with a more
> > > > > > recent one.
> > > > > > 
> > > > > > I strongly recommend you to try first a memory checker (valgrind,
> > > > > > asan) and/or a thread checker (tsan) to see if you are doing
> > > > > > something wrong.  


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic