[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    More crash-proof KCrash (was Re: Konq crash backtraces)
From:       Lubos Lunak <l.lunak () suse ! cz>
Date:       2006-01-11 16:40:43
Message-ID: 200601111740.43565.l.lunak () suse ! cz
[Download RAW message or body]

On Saturday 07 January 2006 13:13, Lubos Lunak wrote:
> Dne pátek 06 leden 2006 22:31 Thiago Macieira napsal(a):
> > Dik Takken wrote:
> > >It just happened again...
> > >
> > >Konqueror crashed without leaving a single trace. I compiled KDE 3.5
> > > with full debug info, which gives me useful backtraces when a crash
> > > occurs. Konqueror is a weird exception. One of its windows can just
> > > disappear in front of your nose, without DrKonqi popping up to get a
> > > backtrace.
> > >
> > Distributions are shipping glibc with some malloc checks, which cause it
> > to abort the program when a double-free or corruption occurs, but without
> > raising SIGABRT. There must be a way of making glibc raise the signal,
> > but I don't know how. It should be enabled by default in KDE code so that
> > the crash handler can catch them.
>
>  Not really. The error checking does raise SIGABRT, it's just that our
> crash handler depends on malloc() working and locks up already in
> DCOPClient::emergencyClose(). If we want to get a crash handler for heap
> corruptions too, we need to stay away from using the corrupted malloc. I
> can give it a shot.

 Ok, it wasn't as trivial, but I have a patch. The changes are:

- no qstrdup() of static strings or actually any strings , no other things 
that result in malloc() like using QCString, i18n() or whatever. Results in 
some ugly code like KAboutData::translateInternalProgramName() but oh well.

- no DCOPClient::emergencyClose() - what's the point anyway, I'm quite sure it 
doesn't get called when I do Ctrl+C and KDE seems to work just fine, so why 
should it be called when crashing - it calls malloc() and who knows what else

- no fork() - this one doesn't actually call malloc(), but there are some 
functions called at pre-fork time that do something with malloc locks (with 
glibc at least) - that leads to lockups as well. Since no fork() means no 
executing of other executables, the only other option I see is feeding the 
data to some other already running process via a socket. Fortunately we 
incidentally have already this kdeinit thingy running that's supposed to 
launch executables when it gets told via a socket, so I copy&pasted some code 
from there. That means some more syscalls, but that should be still more 
safe. There's still a fallback to use fork() in case using kdeinit fails for 
any reason.

 This way KCrash should be more reliable and we should (hopefully) always get 
DrKonqi now. And I can confirm that after one gets used to DrKonqi just 
having an application disappearing without a reason really doesn't feel nice. 
I'd like to commit this to 3.5 branch, so it'd be nice if somebody could at 
least peek at it.

PS: One more benefit would be that we could now make drkonqi a kdeinit module 
in order to make our crashes faster.

PPS: A good way of testing is just doing a double free().

-- 
Lubos Lunak
KDE developer
---------------------------------------------------------------------
SuSE CR, s.r.o.  e-mail: l.lunak@suse.cz , l.lunak@kde.org
Drahobejlova 27  tel: +420 2 9654 2373
190 00 Praha 9   fax: +420 2 9654 2374
Czech Republic   http://www.suse.cz/

["kdecore.patch" (text/x-diff)]

--- kdecore/kcrash.h.sav	2005-09-29 21:31:56.000000000 +0200
+++ kdecore/kcrash.h	2006-01-11 16:50:42.000000000 +0100
@@ -117,6 +117,10 @@ class KDECORE_EXPORT KCrash
    * Pointer to the emergency save function.
    */
   static HandlerType _emergencySaveFunction;
+  
+ private:
+   static void startDrKonqi( const char* argv[], int argc );
+   static void startDirectly( const char* argv[], int argc );
 };
 
 #endif
--- kdecore/kcrash.cpp.sav	2005-09-29 21:31:56.000000000 +0200
+++ kdecore/kcrash.cpp	2006-01-11 16:50:39.000000000 +0100
@@ -38,6 +38,9 @@
 #include <sys/time.h>
 #include <sys/resource.h>
 #include <sys/wait.h>
+#include <sys/un.h>
+#include <sys/socket.h>
+#include <errno.h>
 
 #include <qwindowdefs.h>
 #include <kglobal.h>
@@ -47,6 +50,8 @@
 #include <kapplication.h>
 #include <dcopclient.h>
 
+#include <../kinit/klauncher_cmds.h>
+
 #if defined Q_WS_X11
 #include <X11/Xlib.h>
 #endif
@@ -128,19 +133,15 @@ KCrash::defaultCrashHandler (int sig)
     crashRecursionCounter++; //
   }
 
-        // Close dcop connections
-  DCOPClient::emergencyClose();
   // Close all remaining file descriptors except for stdin/stdout/stderr
   struct rlimit rlp;
   getrlimit(RLIMIT_NOFILE, &rlp);
   for (int i = 3; i < (int)rlp.rlim_cur; i++)
     close(i);
 
-  bool shuttingDown = false;
 
-  // don't load drkonqi during shutdown
-  if ( !shuttingDown )
-  {
+  // this code is leaking, but this should not hurt cause we will do a
+  // exec() afterwards. exec() is supposed to clean up.
     if (crashRecursionCounter < 3)
     {
       if (appName)
@@ -152,105 +153,81 @@ KCrash::defaultCrashHandler (int sig)
         fprintf(stderr, "KCrash: Application '%s' crashing...\n", appName ? appName : "<unknown>");
 #endif
 
-        pid_t pid = fork();
-
-        if (pid <= 0) {
-          // this code is leaking, but this should not hurt cause we will do a
-          // exec() afterwards. exec() is supposed to clean up.
-          char * argv[24]; // don't forget to update this
+          const char * argv[24]; // don't forget to update this
           int i = 0;
 
           // argument 0 has to be drkonqi
-          argv[i++] = qstrdup("drkonqi");
+          argv[i++] = "drkonqi";
 
 #if defined Q_WS_X11
           // start up on the correct display
-          argv[i++] = qstrdup("-display");
+          argv[i++] = "-display";
           if ( qt_xdisplay() )
             argv[i++] = XDisplayString(qt_xdisplay());
           else
             argv[i++] = getenv("DISPLAY");
 #elif defined(Q_WS_QWS)
           // start up on the correct display
-          argv[i++] = qstrdup("-display");
+          argv[i++] = "-display";
           argv[i++] = getenv("QWS_DISPLAY");
 #endif
 
           // we have already tested this
-          argv[i++] = qstrdup("--appname");
-          argv[i++] = qstrdup(appName);
+          argv[i++] = "--appname";
+          argv[i++] = appName;
           if (KApplication::loadedByKdeinit)
-            argv[i++] = qstrdup("--kdeinit");
+            argv[i++] = "--kdeinit";
 
           // only add apppath if it's not NULL
           if (appPath) {
-            argv[i++] = qstrdup("--apppath");
-            argv[i++] = qstrdup(appPath);
+            argv[i++] = "--apppath";
+            argv[i++] = appPath;
           }
 
           // signal number -- will never be NULL
-          QCString tmp;
-          tmp.setNum(sig);
-          argv[i++] = qstrdup("--signal");
-          argv[i++] = qstrdup(tmp.data());
+          char sigtxt[ 10 ];
+          sprintf( sigtxt, "%d", sig );
+          argv[i++] = "--signal";
+          argv[i++] = sigtxt;
 
-          // pid number -- only include if this is the child
-          // the debug stuff will be disabled if we was not able to fork
-          if (pid == 0) {
-            tmp.setNum(getppid());
-            argv[i++] = qstrdup("--pid");
-            argv[i++] = qstrdup(tmp.data());
-          }
+          char pidtxt[ 10 ];
+          sprintf( pidtxt, "%d", getpid());
+          argv[i++] = "--pid";
+          argv[i++] = pidtxt;
 
           const KInstance *instance = KGlobal::_instance;
           const KAboutData *about = instance ? instance->aboutData() : 0;
           if (about) {
-            if (!about->version().isNull()) {
-              argv[i++] = qstrdup("--appversion");
-              argv[i++] = qstrdup(about->version().utf8());
+            if (about->internalVersion()) {
+              argv[i++] = "--appversion";
+              argv[i++] = about->internalVersion();
             }
 
-            if (!about->programName().isNull()) {
-              argv[i++] = qstrdup("--programname");
-              argv[i++] = qstrdup(about->programName().utf8());
+            if (about->internalProgramName()) {
+              argv[i++] = "--programname";
+              argv[i++] = about->internalProgramName();
             }
 
-            if (!about->bugAddress().isNull()) {
-              argv[i++] = qstrdup("--bugaddress");
-              argv[i++] = qstrdup(about->bugAddress().utf8());
+            if (about->internalBugAddress()) {
+              argv[i++] = "--bugaddress";
+              argv[i++] = about->internalBugAddress();
             }
           }
 
           if ( kapp && !kapp->startupId().isNull()) {
-            argv[i++] = qstrdup("--startupid");
-            argv[i++] = qstrdup(kapp->startupId());
+            argv[i++] = "--startupid";
+            argv[i++] = kapp->startupId().data();
           }
 
           if ( safer )
-            argv[i++] = qstrdup("--safer");
+            argv[i++] = "--safer";
 
           // NULL terminated list
-          argv[i++] = NULL;
-
-          setgid(getgid());
-          setuid(getuid());
-
-          execvp("drkonqi", argv);
-
-          // we could clean up here
-          // i = 0;
-          // while (argv[i])
-          //   free(argv[i++]);
-        }
-        else
-        {
-
-          alarm(0); // Seems we made it....
+          argv[i] = NULL;
 
-          // wait for child to exit
-          waitpid(pid, NULL, 0);
+          startDrKonqi( argv, i );
           _exit(253);
-        }
+
       }
       else {
         fprintf(stderr, "Unknown appname\n");
@@ -261,8 +238,279 @@ KCrash::defaultCrashHandler (int sig)
     {
       fprintf(stderr, "Unable to start Dr. Konqi\n");
     }
-  }
 #endif //Q_OS_UNIX
 
   _exit(255);
 }
+
+#ifdef Q_OS_UNIX
+
+// Since we can't fork() in the crashhandler, we cannot execute any external code
+// (there can be functions registered to be performed before fork(), for example
+// handling of malloc locking, which doesn't work when malloc crashes because of heap corruption).
+
+static int write_socket(int sock, char *buffer, int len);
+static int read_socket(int sock, char *buffer, int len);
+static int openSocket();
+
+void KCrash::startDrKonqi( const char* argv[], int argc )
+{
+  int socket = openSocket();
+  if( socket < -1 )
+  {
+    startDirectly( argv, argc );
+    return;
+  }
+  klauncher_header header;
+  header.cmd = LAUNCHER_EXEC_NEW;
+  const int BUFSIZE = 8192; // make sure this is big enough
+  char buffer[ BUFSIZE + 10 ];
+  int pos = 0;
+  long argcl = argc;
+  memcpy( buffer + pos, &argcl, sizeof( argcl ));
+  pos += sizeof( argcl );
+  for( int i = 0;
+       i < argc;
+       ++i )
+  {
+    int len = strlen( argv[ i ] ) + 1; // include terminating \0
+    if( pos + len > BUFSIZE )
+    {
+      fprintf( stderr, "BUFSIZE in KCrash not big enough!\n" );
+      startDirectly( argv, argc );
+      return;
+    }
+    memcpy( buffer + pos, argv[ i ], len );
+    pos += len;
+  }
+  long env = 0;
+  memcpy( buffer + pos, &env, sizeof( env ));
+  pos += sizeof( env );
+  long avoid_loops = 0;
+  memcpy( buffer + pos, &avoid_loops, sizeof( avoid_loops ));
+  pos += sizeof( avoid_loops );
+  header.arg_length = pos;
+  write_socket(socket, (char *) &header, sizeof(header));
+  write_socket(socket, buffer, pos);
+  if( read_socket( socket, (char *) &header, sizeof(header)) < 0
+      || header.cmd != LAUNCHER_OK )
+  {
+    startDirectly( argv, argc );
+    return;
+  }
+  long pid;
+  read_socket(socket, buffer, header.arg_length);
+  pid = *((long *) buffer);
+
+  alarm(0); // Seems we made it....
+
+  for(;;)
+  {
+    if( kill( pid, 0 ) < 0 )
+      _exit(253);
+    sleep( 1 ); // the debugger should stop this process anyway
+  }
+}
+
+// If we can't reach kdeinit we can still at least try to fork()
+void KCrash::startDirectly( const char* argv[], int )
+{
+  fprintf( stderr, "KCrash cannot reach kdeinit, launching directly.\n" );
+  pid_t pid = fork();
+  if (pid <= 0)
+  {
+    setgid(getgid());
+    setuid(getuid());
+    execvp("drkonqi", const_cast< char** >( argv ));
+  }
+  else
+  {
+    alarm(0); // Seems we made it....
+    // wait for child to exit
+    waitpid(pid, NULL, 0);
+    _exit(253);
+  }
+}
+
+// From now on this code is copy&pasted from kinit/wrapper.c :
+
+extern char **environ;
+
+static char *getDisplay()
+{
+   const char *display;
+   char *result;
+   char *screen;
+   char *colon;
+   char *i;
+/*
+ don't test for a value from qglobal.h but instead distinguish
+ Qt/X11 from Qt/Embedded by the fact that Qt/E apps have -DQWS
+ on the commandline (which in qglobal.h however triggers Q_WS_QWS,
+ but we don't want to include that here) (Simon)
+#ifdef Q_WS_X11
+ */
+#if !defined(QWS)
+   display = getenv("DISPLAY");
+#else
+   display = getenv("QWS_DISPLAY");
+#endif
+   if (!display || !*display)
+   {
+      display = ":0";
+   }
+   result = (char*)malloc(strlen(display)+1);
+   if (result == NULL)
+      return NULL;
+
+   strcpy(result, display);
+   screen = strrchr(result, '.');
+   colon = strrchr(result, ':');
+   if (screen && (screen > colon))
+      *screen = '\0';
+   while((i = strchr(result, ':')))
+     *i = '_';
+   return result;
+}
+
+/*
+ * Write 'len' bytes from 'buffer' into 'sock'.
+ * returns 0 on success, -1 on failure.
+ */
+static int write_socket(int sock, char *buffer, int len)
+{
+  ssize_t result;
+  int bytes_left = len;
+  while ( bytes_left > 0)
+  {
+     result = write(sock, buffer, bytes_left);
+     if (result > 0)
+     {
+        buffer += result;
+        bytes_left -= result;
+     }
+     else if (result == 0)
+        return -1;
+     else if ((result == -1) && (errno != EINTR) && (errno != EAGAIN))
+        return -1;
+  }
+  return 0;
+}
+
+/*
+ * Read 'len' bytes from 'sock' into 'buffer'.
+ * returns 0 on success, -1 on failure.
+ */
+static int read_socket(int sock, char *buffer, int len)
+{
+  ssize_t result;
+  int bytes_left = len;
+  while ( bytes_left > 0)
+  {
+     result = read(sock, buffer, bytes_left);
+     if (result > 0)
+     {
+        buffer += result;
+        bytes_left -= result;
+     }
+     else if (result == 0)
+        return -1;
+     else if ((result == -1) && (errno != EINTR) && (errno != EAGAIN))
+        return -1;
+  }
+  return 0;
+}
+
+static int openSocket()
+{
+  kde_socklen_t socklen;
+  int s;
+  struct sockaddr_un server;
+#define MAX_SOCK_FILE 255
+  char sock_file[MAX_SOCK_FILE + 1];
+  const char *home_dir = getenv("HOME");
+  const char *kde_home = getenv("KDEHOME");
+  char *display;
+
+  sock_file[0] = sock_file[MAX_SOCK_FILE] = 0;
+
+  if (!kde_home || !kde_home[0])
+  {
+     kde_home = "~/.kde/";
+  }
+
+  if (kde_home[0] == '~')
+  {
+     if (!home_dir || !home_dir[0])
+     {
+        fprintf(stderr, "Warning: $HOME not set!\n");
+        return -1;
+     }
+     if (strlen(home_dir) > (MAX_SOCK_FILE-100))
+     {
+        fprintf(stderr, "Warning: Home directory path too long!\n");
+        return -1;
+     }
+     kde_home++;
+     strncpy(sock_file, home_dir, MAX_SOCK_FILE);
+  }
+  strncat(sock_file, kde_home, MAX_SOCK_FILE - strlen(sock_file));
+
+  /** Strip trailing '/' **/
+  if ( sock_file[strlen(sock_file)-1] == '/')
+     sock_file[strlen(sock_file)-1] = 0;
+  
+  strncat(sock_file, "/socket-", MAX_SOCK_FILE - strlen(sock_file));
+  if (gethostname(sock_file+strlen(sock_file), MAX_SOCK_FILE - strlen(sock_file) - 1) != 0)
+  {
+     perror("Warning: Could not determine hostname: ");
+     return -1;
+  }
+  sock_file[sizeof(sock_file)-1] = '\0';
+
+  /* append $DISPLAY */
+  display = getDisplay();
+  if (display == NULL)
+  {
+     fprintf(stderr, "Error: Could not determine display.\n");
+     return -1;
+  }
+
+  if (strlen(sock_file)+strlen(display)+strlen("/kdeinit_")+2 > MAX_SOCK_FILE)
+  {
+     fprintf(stderr, "Warning: Socket name will be too long.\n");
+     return -1;
+  }
+  strcat(sock_file, "/kdeinit_");
+  strcat(sock_file, display);
+  free(display);
+
+  if (strlen(sock_file) >= sizeof(server.sun_path))
+  {
+     fprintf(stderr, "Warning: Path of socketfile exceeds UNIX_PATH_MAX.\n");
+     return -1;
+  }
+
+  /*
+   * create the socket stream
+   */
+  s = socket(PF_UNIX, SOCK_STREAM, 0);
+  if (s < 0) 
+  {
+     perror("Warning: socket() failed: ");
+     return -1;
+  }
+
+  server.sun_family = AF_UNIX;
+  strcpy(server.sun_path, sock_file);
+  socklen = sizeof(server);
+  if(connect(s, (struct sockaddr *)&server, socklen) == -1) 
+  {
+     perror("Warning: connect() failed: ");
+     close(s);
+     return -1;
+  }
+  return s;
+}
+
+#endif // Q_OS_UNIX
--- kdecore/kaboutdata.cpp.sav	2005-09-29 21:31:56.000000000 +0200
+++ kdecore/kaboutdata.cpp	2006-01-11 17:34:36.000000000 +0100
@@ -106,6 +106,7 @@ KAboutData::KAboutData( const char *appN
 			const char *bugsEmailAddress
 			) :
   mProgramName( programName ),
+  mTranslatedProgramName( 0 ),
   mVersion( version ),
   mShortDescription( shortDescription ),
   mLicenseKey( licenseType ),
@@ -131,6 +132,7 @@ KAboutData::~KAboutData()
 {
     if (mLicenseKey == License_File)
         delete [] mLicenseText;
+    delete[] mTranslatedProgramName;
     delete d;
 }
 
@@ -179,6 +181,10 @@ void
 KAboutData::setProgramName( const char* programName )
 {
   mProgramName = programName;
+  delete[] mTranslatedProgramName;
+  mTranslatedProgramName = 0;
+  if( KGlobal::locale())
+      mTranslatedProgramName = qstrdup( i18n( mProgramName ).utf8());
 }
 
 void
@@ -253,6 +259,27 @@ KAboutData::programName() const
       return QString::null;
 }
 
+const char*
+KAboutData::internalProgramName() const
+{
+   if (mTranslatedProgramName)
+      return mTranslatedProgramName;
+   else
+      return mProgramName;
+}
+
+// KCrash should call as few things as possible and should avoid e.g. malloc()
+// because it may deadlock. Since i18n() needs it, when KLocale is available
+// the i18n() call will be done here in advance.
+void
+KAboutData::translateInternalProgramName() const
+{
+  delete[] mTranslatedProgramName;
+  mTranslatedProgramName = 0;
+  if( KGlobal::locale())
+      mTranslatedProgramName = qstrdup( i18n( mProgramName ).utf8());
+}
+
 QImage
 KAboutData::programLogo() const
 {
--- kdecore/kaboutdata.h.sav	2005-09-29 21:31:55.000000000 +0200
+++ kdecore/kaboutdata.h	2006-01-11 17:34:20.000000000 +0100
@@ -464,6 +464,15 @@ class KDECORE_EXPORT KAboutData
     QString programName() const;
 
     /**
+     * @internal
+     */
+    const char* internalProgramName() const;
+    /**
+     * @internal
+     */
+    void translateInternalProgramName() const;
+
+    /**
      * Returns the program logo image. 
      * @return the program logo data or null image if there is 
      * no custom application logo defined.
@@ -478,6 +487,11 @@ class KDECORE_EXPORT KAboutData
     QString version() const;
 
     /**
+     * @internal
+     */
+    const char* internalVersion() const { return mVersion; }
+
+    /**
      * Returns a short, translated description.
      * @return the short description (translated). Can be
      *         QString::null if not set.
@@ -496,6 +510,11 @@ class KDECORE_EXPORT KAboutData
      * @return the email address where to report bugs.
      */
     QString bugAddress() const;
+    
+    /**
+     * @internal
+     */
+    const char* internalBugAddress() const { return mBugEmailAddress; }
 
     /**
      * Returns a list of authors.
@@ -591,6 +610,7 @@ class KDECORE_EXPORT KAboutData
   private:
     const char *mAppName;
     const char *mProgramName;
+    mutable const char *mTranslatedProgramName;
     const char *mVersion;
     const char *mShortDescription;
     int mLicenseKey;
--- kdecore/kglobal.cpp.sav	2005-09-29 21:31:56.000000000 +0200
+++ kdecore/kglobal.cpp	2006-01-11 17:33:57.000000000 +0100
@@ -28,7 +28,7 @@
 #include "kglobal.h"
 
 #include <kapplication.h>
-
+#include <kaboutdata.h>
 #include <kdebug.h>
 #include <kconfig.h>
 #include <klocale.h>
@@ -94,6 +94,8 @@ KLocale	*KGlobal::locale()
 
         // will set _locale if it works - otherwise 0 is returned
         KLocale::initInstance();
+        if( _instance->aboutData())
+            _instance->aboutData()->translateInternalProgramName();
     }
 
     return _locale;


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic