From kde-core-devel Wed Jan 11 16:40:43 2006 From: Lubos Lunak Date: Wed, 11 Jan 2006 16:40:43 +0000 To: kde-core-devel Subject: More crash-proof KCrash (was Re: Konq crash backtraces) Message-Id: <200601111740.43565.l.lunak () suse ! cz> X-MARC-Message: https://marc.info/?l=kde-core-devel&m=113699766611213 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--Boundary-00=_LUTxD+lnw94A1Kh" --Boundary-00=_LUTxD+lnw94A1Kh Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Content-Disposition: inline On Saturday 07 January 2006 13:13, Lubos Lunak wrote: > Dne pátek 06 leden 2006 22:31 Thiago Macieira napsal(a): > > Dik Takken wrote: > > >It just happened again... > > > > > >Konqueror crashed without leaving a single trace. I compiled KDE 3.5 > > > with full debug info, which gives me useful backtraces when a crash > > > occurs. Konqueror is a weird exception. One of its windows can just > > > disappear in front of your nose, without DrKonqi popping up to get a > > > backtrace. > > > > > Distributions are shipping glibc with some malloc checks, which cause it > > to abort the program when a double-free or corruption occurs, but without > > raising SIGABRT. There must be a way of making glibc raise the signal, > > but I don't know how. It should be enabled by default in KDE code so that > > the crash handler can catch them. > > Not really. The error checking does raise SIGABRT, it's just that our > crash handler depends on malloc() working and locks up already in > DCOPClient::emergencyClose(). If we want to get a crash handler for heap > corruptions too, we need to stay away from using the corrupted malloc. I > can give it a shot. Ok, it wasn't as trivial, but I have a patch. The changes are: - no qstrdup() of static strings or actually any strings , no other things that result in malloc() like using QCString, i18n() or whatever. Results in some ugly code like KAboutData::translateInternalProgramName() but oh well. - no DCOPClient::emergencyClose() - what's the point anyway, I'm quite sure it doesn't get called when I do Ctrl+C and KDE seems to work just fine, so why should it be called when crashing - it calls malloc() and who knows what else - no fork() - this one doesn't actually call malloc(), but there are some functions called at pre-fork time that do something with malloc locks (with glibc at least) - that leads to lockups as well. Since no fork() means no executing of other executables, the only other option I see is feeding the data to some other already running process via a socket. Fortunately we incidentally have already this kdeinit thingy running that's supposed to launch executables when it gets told via a socket, so I copy&pasted some code from there. That means some more syscalls, but that should be still more safe. There's still a fallback to use fork() in case using kdeinit fails for any reason. This way KCrash should be more reliable and we should (hopefully) always get DrKonqi now. And I can confirm that after one gets used to DrKonqi just having an application disappearing without a reason really doesn't feel nice. I'd like to commit this to 3.5 branch, so it'd be nice if somebody could at least peek at it. PS: One more benefit would be that we could now make drkonqi a kdeinit module in order to make our crashes faster. PPS: A good way of testing is just doing a double free(). -- Lubos Lunak KDE developer --------------------------------------------------------------------- SuSE CR, s.r.o. e-mail: l.lunak@suse.cz , l.lunak@kde.org Drahobejlova 27 tel: +420 2 9654 2373 190 00 Praha 9 fax: +420 2 9654 2374 Czech Republic http://www.suse.cz/ --Boundary-00=_LUTxD+lnw94A1Kh Content-Type: text/x-diff; charset="utf-8"; name="kdecore.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="kdecore.patch" --- kdecore/kcrash.h.sav 2005-09-29 21:31:56.000000000 +0200 +++ kdecore/kcrash.h 2006-01-11 16:50:42.000000000 +0100 @@ -117,6 +117,10 @@ class KDECORE_EXPORT KCrash * Pointer to the emergency save function. */ static HandlerType _emergencySaveFunction; + + private: + static void startDrKonqi( const char* argv[], int argc ); + static void startDirectly( const char* argv[], int argc ); }; #endif --- kdecore/kcrash.cpp.sav 2005-09-29 21:31:56.000000000 +0200 +++ kdecore/kcrash.cpp 2006-01-11 16:50:39.000000000 +0100 @@ -38,6 +38,9 @@ #include #include #include +#include +#include +#include #include #include @@ -47,6 +50,8 @@ #include #include +#include <../kinit/klauncher_cmds.h> + #if defined Q_WS_X11 #include #endif @@ -128,19 +133,15 @@ KCrash::defaultCrashHandler (int sig) crashRecursionCounter++; // } - // Close dcop connections - DCOPClient::emergencyClose(); // Close all remaining file descriptors except for stdin/stdout/stderr struct rlimit rlp; getrlimit(RLIMIT_NOFILE, &rlp); for (int i = 3; i < (int)rlp.rlim_cur; i++) close(i); - bool shuttingDown = false; - // don't load drkonqi during shutdown - if ( !shuttingDown ) - { + // this code is leaking, but this should not hurt cause we will do a + // exec() afterwards. exec() is supposed to clean up. if (crashRecursionCounter < 3) { if (appName) @@ -152,105 +153,81 @@ KCrash::defaultCrashHandler (int sig) fprintf(stderr, "KCrash: Application '%s' crashing...\n", appName ? appName : ""); #endif - pid_t pid = fork(); - - if (pid <= 0) { - // this code is leaking, but this should not hurt cause we will do a - // exec() afterwards. exec() is supposed to clean up. - char * argv[24]; // don't forget to update this + const char * argv[24]; // don't forget to update this int i = 0; // argument 0 has to be drkonqi - argv[i++] = qstrdup("drkonqi"); + argv[i++] = "drkonqi"; #if defined Q_WS_X11 // start up on the correct display - argv[i++] = qstrdup("-display"); + argv[i++] = "-display"; if ( qt_xdisplay() ) argv[i++] = XDisplayString(qt_xdisplay()); else argv[i++] = getenv("DISPLAY"); #elif defined(Q_WS_QWS) // start up on the correct display - argv[i++] = qstrdup("-display"); + argv[i++] = "-display"; argv[i++] = getenv("QWS_DISPLAY"); #endif // we have already tested this - argv[i++] = qstrdup("--appname"); - argv[i++] = qstrdup(appName); + argv[i++] = "--appname"; + argv[i++] = appName; if (KApplication::loadedByKdeinit) - argv[i++] = qstrdup("--kdeinit"); + argv[i++] = "--kdeinit"; // only add apppath if it's not NULL if (appPath) { - argv[i++] = qstrdup("--apppath"); - argv[i++] = qstrdup(appPath); + argv[i++] = "--apppath"; + argv[i++] = appPath; } // signal number -- will never be NULL - QCString tmp; - tmp.setNum(sig); - argv[i++] = qstrdup("--signal"); - argv[i++] = qstrdup(tmp.data()); + char sigtxt[ 10 ]; + sprintf( sigtxt, "%d", sig ); + argv[i++] = "--signal"; + argv[i++] = sigtxt; - // pid number -- only include if this is the child - // the debug stuff will be disabled if we was not able to fork - if (pid == 0) { - tmp.setNum(getppid()); - argv[i++] = qstrdup("--pid"); - argv[i++] = qstrdup(tmp.data()); - } + char pidtxt[ 10 ]; + sprintf( pidtxt, "%d", getpid()); + argv[i++] = "--pid"; + argv[i++] = pidtxt; const KInstance *instance = KGlobal::_instance; const KAboutData *about = instance ? instance->aboutData() : 0; if (about) { - if (!about->version().isNull()) { - argv[i++] = qstrdup("--appversion"); - argv[i++] = qstrdup(about->version().utf8()); + if (about->internalVersion()) { + argv[i++] = "--appversion"; + argv[i++] = about->internalVersion(); } - if (!about->programName().isNull()) { - argv[i++] = qstrdup("--programname"); - argv[i++] = qstrdup(about->programName().utf8()); + if (about->internalProgramName()) { + argv[i++] = "--programname"; + argv[i++] = about->internalProgramName(); } - if (!about->bugAddress().isNull()) { - argv[i++] = qstrdup("--bugaddress"); - argv[i++] = qstrdup(about->bugAddress().utf8()); + if (about->internalBugAddress()) { + argv[i++] = "--bugaddress"; + argv[i++] = about->internalBugAddress(); } } if ( kapp && !kapp->startupId().isNull()) { - argv[i++] = qstrdup("--startupid"); - argv[i++] = qstrdup(kapp->startupId()); + argv[i++] = "--startupid"; + argv[i++] = kapp->startupId().data(); } if ( safer ) - argv[i++] = qstrdup("--safer"); + argv[i++] = "--safer"; // NULL terminated list - argv[i++] = NULL; - - setgid(getgid()); - setuid(getuid()); - - execvp("drkonqi", argv); - - // we could clean up here - // i = 0; - // while (argv[i]) - // free(argv[i++]); - } - else - { - - alarm(0); // Seems we made it.... + argv[i] = NULL; - // wait for child to exit - waitpid(pid, NULL, 0); + startDrKonqi( argv, i ); _exit(253); - } + } else { fprintf(stderr, "Unknown appname\n"); @@ -261,8 +238,279 @@ KCrash::defaultCrashHandler (int sig) { fprintf(stderr, "Unable to start Dr. Konqi\n"); } - } #endif //Q_OS_UNIX _exit(255); } + +#ifdef Q_OS_UNIX + +// Since we can't fork() in the crashhandler, we cannot execute any external code +// (there can be functions registered to be performed before fork(), for example +// handling of malloc locking, which doesn't work when malloc crashes because of heap corruption). + +static int write_socket(int sock, char *buffer, int len); +static int read_socket(int sock, char *buffer, int len); +static int openSocket(); + +void KCrash::startDrKonqi( const char* argv[], int argc ) +{ + int socket = openSocket(); + if( socket < -1 ) + { + startDirectly( argv, argc ); + return; + } + klauncher_header header; + header.cmd = LAUNCHER_EXEC_NEW; + const int BUFSIZE = 8192; // make sure this is big enough + char buffer[ BUFSIZE + 10 ]; + int pos = 0; + long argcl = argc; + memcpy( buffer + pos, &argcl, sizeof( argcl )); + pos += sizeof( argcl ); + for( int i = 0; + i < argc; + ++i ) + { + int len = strlen( argv[ i ] ) + 1; // include terminating \0 + if( pos + len > BUFSIZE ) + { + fprintf( stderr, "BUFSIZE in KCrash not big enough!\n" ); + startDirectly( argv, argc ); + return; + } + memcpy( buffer + pos, argv[ i ], len ); + pos += len; + } + long env = 0; + memcpy( buffer + pos, &env, sizeof( env )); + pos += sizeof( env ); + long avoid_loops = 0; + memcpy( buffer + pos, &avoid_loops, sizeof( avoid_loops )); + pos += sizeof( avoid_loops ); + header.arg_length = pos; + write_socket(socket, (char *) &header, sizeof(header)); + write_socket(socket, buffer, pos); + if( read_socket( socket, (char *) &header, sizeof(header)) < 0 + || header.cmd != LAUNCHER_OK ) + { + startDirectly( argv, argc ); + return; + } + long pid; + read_socket(socket, buffer, header.arg_length); + pid = *((long *) buffer); + + alarm(0); // Seems we made it.... + + for(;;) + { + if( kill( pid, 0 ) < 0 ) + _exit(253); + sleep( 1 ); // the debugger should stop this process anyway + } +} + +// If we can't reach kdeinit we can still at least try to fork() +void KCrash::startDirectly( const char* argv[], int ) +{ + fprintf( stderr, "KCrash cannot reach kdeinit, launching directly.\n" ); + pid_t pid = fork(); + if (pid <= 0) + { + setgid(getgid()); + setuid(getuid()); + execvp("drkonqi", const_cast< char** >( argv )); + } + else + { + alarm(0); // Seems we made it.... + // wait for child to exit + waitpid(pid, NULL, 0); + _exit(253); + } +} + +// From now on this code is copy&pasted from kinit/wrapper.c : + +extern char **environ; + +static char *getDisplay() +{ + const char *display; + char *result; + char *screen; + char *colon; + char *i; +/* + don't test for a value from qglobal.h but instead distinguish + Qt/X11 from Qt/Embedded by the fact that Qt/E apps have -DQWS + on the commandline (which in qglobal.h however triggers Q_WS_QWS, + but we don't want to include that here) (Simon) +#ifdef Q_WS_X11 + */ +#if !defined(QWS) + display = getenv("DISPLAY"); +#else + display = getenv("QWS_DISPLAY"); +#endif + if (!display || !*display) + { + display = ":0"; + } + result = (char*)malloc(strlen(display)+1); + if (result == NULL) + return NULL; + + strcpy(result, display); + screen = strrchr(result, '.'); + colon = strrchr(result, ':'); + if (screen && (screen > colon)) + *screen = '\0'; + while((i = strchr(result, ':'))) + *i = '_'; + return result; +} + +/* + * Write 'len' bytes from 'buffer' into 'sock'. + * returns 0 on success, -1 on failure. + */ +static int write_socket(int sock, char *buffer, int len) +{ + ssize_t result; + int bytes_left = len; + while ( bytes_left > 0) + { + result = write(sock, buffer, bytes_left); + if (result > 0) + { + buffer += result; + bytes_left -= result; + } + else if (result == 0) + return -1; + else if ((result == -1) && (errno != EINTR) && (errno != EAGAIN)) + return -1; + } + return 0; +} + +/* + * Read 'len' bytes from 'sock' into 'buffer'. + * returns 0 on success, -1 on failure. + */ +static int read_socket(int sock, char *buffer, int len) +{ + ssize_t result; + int bytes_left = len; + while ( bytes_left > 0) + { + result = read(sock, buffer, bytes_left); + if (result > 0) + { + buffer += result; + bytes_left -= result; + } + else if (result == 0) + return -1; + else if ((result == -1) && (errno != EINTR) && (errno != EAGAIN)) + return -1; + } + return 0; +} + +static int openSocket() +{ + kde_socklen_t socklen; + int s; + struct sockaddr_un server; +#define MAX_SOCK_FILE 255 + char sock_file[MAX_SOCK_FILE + 1]; + const char *home_dir = getenv("HOME"); + const char *kde_home = getenv("KDEHOME"); + char *display; + + sock_file[0] = sock_file[MAX_SOCK_FILE] = 0; + + if (!kde_home || !kde_home[0]) + { + kde_home = "~/.kde/"; + } + + if (kde_home[0] == '~') + { + if (!home_dir || !home_dir[0]) + { + fprintf(stderr, "Warning: $HOME not set!\n"); + return -1; + } + if (strlen(home_dir) > (MAX_SOCK_FILE-100)) + { + fprintf(stderr, "Warning: Home directory path too long!\n"); + return -1; + } + kde_home++; + strncpy(sock_file, home_dir, MAX_SOCK_FILE); + } + strncat(sock_file, kde_home, MAX_SOCK_FILE - strlen(sock_file)); + + /** Strip trailing '/' **/ + if ( sock_file[strlen(sock_file)-1] == '/') + sock_file[strlen(sock_file)-1] = 0; + + strncat(sock_file, "/socket-", MAX_SOCK_FILE - strlen(sock_file)); + if (gethostname(sock_file+strlen(sock_file), MAX_SOCK_FILE - strlen(sock_file) - 1) != 0) + { + perror("Warning: Could not determine hostname: "); + return -1; + } + sock_file[sizeof(sock_file)-1] = '\0'; + + /* append $DISPLAY */ + display = getDisplay(); + if (display == NULL) + { + fprintf(stderr, "Error: Could not determine display.\n"); + return -1; + } + + if (strlen(sock_file)+strlen(display)+strlen("/kdeinit_")+2 > MAX_SOCK_FILE) + { + fprintf(stderr, "Warning: Socket name will be too long.\n"); + return -1; + } + strcat(sock_file, "/kdeinit_"); + strcat(sock_file, display); + free(display); + + if (strlen(sock_file) >= sizeof(server.sun_path)) + { + fprintf(stderr, "Warning: Path of socketfile exceeds UNIX_PATH_MAX.\n"); + return -1; + } + + /* + * create the socket stream + */ + s = socket(PF_UNIX, SOCK_STREAM, 0); + if (s < 0) + { + perror("Warning: socket() failed: "); + return -1; + } + + server.sun_family = AF_UNIX; + strcpy(server.sun_path, sock_file); + socklen = sizeof(server); + if(connect(s, (struct sockaddr *)&server, socklen) == -1) + { + perror("Warning: connect() failed: "); + close(s); + return -1; + } + return s; +} + +#endif // Q_OS_UNIX --- kdecore/kaboutdata.cpp.sav 2005-09-29 21:31:56.000000000 +0200 +++ kdecore/kaboutdata.cpp 2006-01-11 17:34:36.000000000 +0100 @@ -106,6 +106,7 @@ KAboutData::KAboutData( const char *appN const char *bugsEmailAddress ) : mProgramName( programName ), + mTranslatedProgramName( 0 ), mVersion( version ), mShortDescription( shortDescription ), mLicenseKey( licenseType ), @@ -131,6 +132,7 @@ KAboutData::~KAboutData() { if (mLicenseKey == License_File) delete [] mLicenseText; + delete[] mTranslatedProgramName; delete d; } @@ -179,6 +181,10 @@ void KAboutData::setProgramName( const char* programName ) { mProgramName = programName; + delete[] mTranslatedProgramName; + mTranslatedProgramName = 0; + if( KGlobal::locale()) + mTranslatedProgramName = qstrdup( i18n( mProgramName ).utf8()); } void @@ -253,6 +259,27 @@ KAboutData::programName() const return QString::null; } +const char* +KAboutData::internalProgramName() const +{ + if (mTranslatedProgramName) + return mTranslatedProgramName; + else + return mProgramName; +} + +// KCrash should call as few things as possible and should avoid e.g. malloc() +// because it may deadlock. Since i18n() needs it, when KLocale is available +// the i18n() call will be done here in advance. +void +KAboutData::translateInternalProgramName() const +{ + delete[] mTranslatedProgramName; + mTranslatedProgramName = 0; + if( KGlobal::locale()) + mTranslatedProgramName = qstrdup( i18n( mProgramName ).utf8()); +} + QImage KAboutData::programLogo() const { --- kdecore/kaboutdata.h.sav 2005-09-29 21:31:55.000000000 +0200 +++ kdecore/kaboutdata.h 2006-01-11 17:34:20.000000000 +0100 @@ -464,6 +464,15 @@ class KDECORE_EXPORT KAboutData QString programName() const; /** + * @internal + */ + const char* internalProgramName() const; + /** + * @internal + */ + void translateInternalProgramName() const; + + /** * Returns the program logo image. * @return the program logo data or null image if there is * no custom application logo defined. @@ -478,6 +487,11 @@ class KDECORE_EXPORT KAboutData QString version() const; /** + * @internal + */ + const char* internalVersion() const { return mVersion; } + + /** * Returns a short, translated description. * @return the short description (translated). Can be * QString::null if not set. @@ -496,6 +510,11 @@ class KDECORE_EXPORT KAboutData * @return the email address where to report bugs. */ QString bugAddress() const; + + /** + * @internal + */ + const char* internalBugAddress() const { return mBugEmailAddress; } /** * Returns a list of authors. @@ -591,6 +610,7 @@ class KDECORE_EXPORT KAboutData private: const char *mAppName; const char *mProgramName; + mutable const char *mTranslatedProgramName; const char *mVersion; const char *mShortDescription; int mLicenseKey; --- kdecore/kglobal.cpp.sav 2005-09-29 21:31:56.000000000 +0200 +++ kdecore/kglobal.cpp 2006-01-11 17:33:57.000000000 +0100 @@ -28,7 +28,7 @@ #include "kglobal.h" #include - +#include #include #include #include @@ -94,6 +94,8 @@ KLocale *KGlobal::locale() // will set _locale if it works - otherwise 0 is returned KLocale::initInstance(); + if( _instance->aboutData()) + _instance->aboutData()->translateInternalProgramName(); } return _locale; --Boundary-00=_LUTxD+lnw94A1Kh--