[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gdb-cvs
Subject:    [binutils-gdb] (Windows) fix thr != nullptr assert failure in delete_thread_1
From:       Joel Brobecker <brobecke () sourceware ! org>
Date:       2019-04-30 21:01:09
Message-ID: 20190430210109.85297.qmail () sourceware ! org
[Download RAW message or body]

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8ed5b76ea2de370265382dab1d538a919e2603ab

commit 8ed5b76ea2de370265382dab1d538a919e2603ab
Author: Joel Brobecker <brobecker@adacore.com>
Date:   Tue Apr 30 15:59:17 2019 -0500

    (Windows) fix thr != nullptr assert failure in delete_thread_1
    
    We have observed that GDB would randomly trip the following
    assertion failure when debugging on Windows. When allowing
    the program to run until the inferior exits, we occasionally see:
    
         (gdb) cont
         Continuing.
         [Thread 48192.0xd100 exited with code 1]
         [Thread 48192.0x10ad8 exited with code 1]
         [Thread 48192.0x36e28 exited with code 0]
         [Thread 48192.0x52be4 exited with code 0]
         [Thread 48192.0x5aa40 exited with code 0]
         ../../src/gdb/thread.c:453: internal-error: void delete_thread_1(thread_inf
    o*, bool): Assertion `thr != nullptr' failed.
    
    Running the same scenario with some additional traces enabled...
    
        (gdb) set verbose
        (gdb) set debugevents
    
    ... allows us to understand what the issue is. To understand, we need
    to first look at the events received when starting the program, and
    in particular which threads got created how. First, we get a
    CREATE_PROCESS_DEBUG_EVENT for tid=0x442a8:
    
        gdb: kernel event for pid=317536 tid=0x442a8 code=CREATE_PROCESS_DEBUG_EVENT)
    
    Shortly after, we get some CREATE_THREAD_DEBUG_EVENT events,
    one of them being for tid=0x4010c:
    
        gdb: kernel event for pid=317536 tid=0x4010c code=CREATE_THREAD_DEBUG_EVENT)
    Fast forward a bit of debugging, and we do a "cont" as above,
    at which point the programs reaches the end, and the system reports
    "exit" events. The first interesting one is the following:
    
        gdb: kernel event for pid=317536 tid=0x442a8 code=EXIT_THREAD_DEBUG_EVENT)
    
    This is reporting a thread-exit event for a thread whose tid
    is the TID of what we call the "main thread". That's the thread
    that was created when we received the CREATE_PROCESS_DEBUG_EVENT
    notification, and whose TID is actually stored in a global variable
    named main_thread_id. This is not something we expected, as
    the assumption we made was that the main thread would exit last,
    and we would be notified of it via an EXIT_PROCESS_DEBUG_EVENT.
    But apparently, this is not always true, at least on Windows Server
    2012 and 2016 where this issue has been observed happening randomly.
    
    The consequence of the above notification is that we call
    windows_delete_thread for that thread, which removes it from
    our list of known threads.
    
    And a little bit later, then we then get the EXIT_PROCESS_DEBUG_EVENT,
    and we can see that the associated tid is not the main_thread_id,
    but rather the tid of one of the threads that was created during
    the lifetime of the program, in this case tid=0x4010c:
    
        gdb: kernel event for pid=317536 tid=0x4010c code=EXIT_PROCESS_DEBUG_EVENT)
    
    And the debug trace printed right after shows why we're crashing:
    
        [Deleting Thread 317536.0x442a8]
    
    We are trying to delete the thread whose tid=0x442a8, which is
    the main_thread_id! As we have already deleted that thread before,
    the search for it returns a nullptr, which then trips the assertion
    check in delete_thread_1.
    
    This commit fixes this issue. It ignores the open question of
    what to do with the main_thread_id global, particularly after
    that thread has been removed from our list of threads. This will
    be dealt with as a separate patch, to allow cherry-picking
    this patch into a release branch.
    
    For now, we fix the code so as to avoid this crash.
    
    gdb/ChangeLog:
    
    	* windows-nat.c (get_windows_debug_event) <EXIT_PROCESS_DEBUG_EVENT>:
    	Use current_event.dwThreadId instead of main_thread_id.

Diff:
---
 gdb/ChangeLog     | 5 +++++
 gdb/windows-nat.c | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index c09d4d4..1b4b83d 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,8 @@
+2019-04-30  Joel Brobecker  <brobecker@adacore.com>
+
+	* windows-nat.c (get_windows_debug_event) <EXIT_PROCESS_DEBUG_EVENT>:
+	Use current_event.dwThreadId instead of main_thread_id.
+
 2019-04-30  Tom Tromey  <tromey@adacore.com>
 
 	* ada-lang.c (ada_lookup_simple_minsyms): New function.
diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c
index 5009418..9f3242c 100644
--- a/gdb/windows-nat.c
+++ b/gdb/windows-nat.c
@@ -1637,11 +1637,11 @@ get_windows_debug_event (struct target_ops *ops,
       else if (saw_create == 1)
 	{
 	  windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
-					 main_thread_id),
+					 current_event.dwThreadId),
 				 0, true /* main_thread_p */);
 	  ourstatus->kind = TARGET_WAITKIND_EXITED;
 	  ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode;
-	  thread_id = main_thread_id;
+	  thread_id = current_event.dwThreadId;
 	}
       break;
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic