'Same thread is reported as [New thread xyz] endlessly'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gdb
Subject:    Same thread is reported as [New thread xyz] endlessly
From:       Raphael Zulliger <zulliger () indel ! ch>
Date:       2014-12-04 7:27:58
Message-ID: 54800CFE.8070407 () indel ! ch
[Download RAW message or body]

Hi

With my slightly patched GDB, debugging an extended-remote target, I 
encountered issues showing a kind of "phantom thread": A thread that's 
permanently reported to be 'created' again and again (gdb reports [New 
Thread xyz], xyz is always the same). This happened although the remote 
target definitely did not send notifications related to that "phantom 
thread". This renders GDB useless, as I can't, for example, expand the 
callstack in Eclipse/CDT anymore.

I think I found the bug - or at least a way to circumvent the issue. I 
thought I send it to this list, although the information I can give you 
is quite limited... My hope is that it may just be obvious for someone 
of you guys and you can come up with a fix, even with this little 
information.

The following happens in the case of the "phantom thread":
   'struct thread_info *add_thread_silent (ptid_t ptid)'
is called with ptid of the phantom thread. Then,
   'find_thread_ptid (ptid);'
returns a non-NULL pointer. Because
   ptid_equal (inferior_ptid, ptid)
is false we call
   'delete_thread(ptid)'
The problem with that seems to be that
   'tp->refcount > 0'
is true - and therefore
   'static void delete_thread_1 (ptid_t ptid, int silent)'
will not actually delete the thread right away, but only marks it as
   'tp->state = THREAD_EXITED'.
After that, we call
   'tp = new_thread (ptid);'
Although the *thread has not been deleted yet*. This then creates a kind 
of yet another thread with that ptid...

I think there's a flaw in that mechanism:
   'struct thread_info *add_thread_silent (ptid_t ptid)'
only checks for
   'ptid_equal (inferior_ptid, ptid)'
while
   'static void delete_thread_1 (ptid_t ptid, int silent)'
checks for
   'tp->refcount > 0 || ptid_equal (tp->ptid, inferior_ptid)'
Shouldn't those checks by 'in-sync'?

What seems to help is the following: In 'struct thread_info 
*add_thread_silent (ptid_t ptid)', add an additional 'else if' like in here:

struct thread_info *
add_thread_silent (ptid_t ptid)
{
   struct thread_info *tp;

   tp = find_thread_ptid (ptid);
   if (tp)
     /* Found an old thread with the same id.  It has to be dead,
        otherwise we wouldn't be adding a new thread with the same id.
        The OS is reusing this id --- delete it, and recreate a new
        one.  */
     {
       /* In addition to deleting the thread, if this is the current
      thread, then we need to take care that delete_thread doesn't
      really delete the thread if it is inferior_ptid.  Create a
      new template thread in the list with an invalid ptid, switch
      to it, delete the original thread, reset the new thread's
      ptid, and switch to it.  */

     if (ptid_equal (inferior_ptid, ptid))
     {
         ...
     }
     else if((tp->refcount > 0)) {
       /* Now reset its ptid, and reswitch inferior_ptid to it.  */
       tp->state = THREAD_STOPPED;
       observer_notify_new_thread (tp);
       /* All done.  */
       return tp;
     }
       else
     /* Just go ahead and delete it.  */
     delete_thread (ptid);
     }

...


Btw: The callstack at the time GDB came into that newly added 'else if' 
was like this:

Thread [1] 18196 [core: 3] (Suspended : Signal : 0:Signal 0)
     add_thread_silent() at thread.c:261 0x5dd485
     add_thread_with_info() at thread.c:292 0x5dd5e8
     add_thread() at thread.c:306 0x5dd67d
     remote_add_thread() at remote.c:1,524 0x4a94d2
     remote_notice_new_inferior() at remote.c:1,547 0x4a959d
     process_stop_reply() at remote.c:5,837 0x4b2168
     remote_wait_ns() at remote.c:5,887 0x4b22dc
     remote_wait() at remote.c:6,062 0x4b281a
     target_wait() at target.c:2,660 0x611963
     fetch_inferior_event() at infrun.c:2,821 0x5cac1c
     fetch_inferior_event_wrapper() at inf-loop.c:149 0x5ee1f6
     catch_errors() at exceptions.c:546 0x5e1504
     inferior_event_handler() at inf-loop.c:53 0x5edf4b
     remote_async_inferior_event_handler() at remote.c:11,737 0x4bd768
     invoke_async_event_handler() at event-loop.c:1,073 0x5ec774
     process_event() at event-loop.c:342 0x5eb291
     gdb_do_one_event() at event-loop.c:394 0x5eb333
     start_event_loop() at event-loop.c:431 0x5eb3a8
     mi_command_loop() at mi-interp.c:354 0x4eb99e
     mi2_command_loop() at mi-interp.c:334 0x4eb94f
     current_interp_command_loop() at interps.c:326 0x5e306c
     captured_command_loop() at main.c:260 0x5e42ac
     catch_errors() at exceptions.c:546 0x5e1504
     captured_main() at main.c:1,055 0x5e572a
     catch_errors() at exceptions.c:546 0x5e1504
     gdb_main() at main.c:1,064 0x5e5760
     main() at gdb.c:34 0x457a2b


Note that this issue is strongly timing relevant. Right now, I've a 
situation in which it's quite good reproducible - but usually it is not.

GDB (as said: slightly patched):

GNU gdb (GDB) 7.6.50.20130604-cvs
...
This GDB was configured as "--host=x86_64-unknown-linux-gnu 
--target=powerpc-indel-eabi".

Note that I can't easily check this issue against 'master' as debugging 
my remote target doesn't work out of the box with it.


Raphael

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic