Hello, see http://sources.redhat.com/ml/libc-alpha/2002-02/msg00107.html for details. In short, we're using malloc() very extensively, and a noticeable part of the execution time is spent handling dynamic allocations. Which means that malloc() should be very very fast, and if it's not, this affects overall KDE performance. The problem is we're linking against -lpthread, which makes malloc() use a mutex for locking (even though most KDE apps aren't actually threaded at the present time), and this makes malloc() to be not that very very fast. I tried to do some benchmarks, and I e.g. managed to reduce time needed for fully rendering $QTDIR/doc/html/functions.html from 60s to 39s (30%) by LD_PRELOAD-ing a different malloc() implementation (Doug Lea's malloc), which I also tweaked a bit. Real world cases are a bit difficult to measure, but the improvement should be at least 10% everywhere. This is only for glibc < 2.3 , I don't know about other systems. Also, with the current glibc CVS (i.e. the yet to be released glibc-2.3), malloc() uses already a spinlock instead of a mutex, and it has almost the same performance as my tuned malloc(). I'm going to include this malloc() implementation in libkdecore, and I already got ok from Dirk, as long it has to be explicitly enabled by a configure switch. It was already discussed a bit on IRC too. In case you have some thoughts on this, feel free to comment. I'll describe what I exactly want to do. There will be a configure option for this, disabled by default (not enough time to really test it, if nothing else). It will work only with glibc, as I have no idea about the situation with non-glibc systems. It also requires a spinlock implementation (i.e. some assembler), I have right now only a x86 one. However, I'd like to keep it also after glibc-2.3 is released (still only optional). Even with malloc() from the current glibc CVS, I can get about 5% improvement on the functions.html page with the tuned malloc(). Glibc malloc() still has malloc hooks, and is optimized for many threads (it's ptmalloc, which is a threaded version of Doug Lea's malloc), which is something we don't need. The only time I needed malloc hooks was for kdesdk/kmtrace, which is LD_PRELOAD-ed anyway, so it can work around it. Code optimized for many threads - I'd first have to see a KDE application where that's needed. Not to mention that I even tried the malloc() implementations with several threads running, and the simple spinlock only variant didn't perform worse than the glibc one with 4 threads doing nothing just calling malloc() and free() in loops (but I don't have access to SMP machine, so there it might be different). BTW, just to show how damn fast malloc() has to be: I have here also a test version of malloc(), which only allocates memory continuously from a large array and free() is empty function(practically unusable, but as close to no-op as possible). That functions.html example needs 35s then (vs 39s with the tuned malloc()). If I add 'for(int i=0;i<70;++i);' to both this malloc() and free(), it becomes 39s (gcc doesn't optimise out empty loops). I also tried to write my own malloc(), which did only a few bitfield operations and little pointer arithmetics - not fast enough, 10% slower then glibc-2.3 malloc (even though it needs about 5-8% less memory, but I doubt anyone is going to trade that for speed). Having the possibility to use a malloc() tuned for KDE's needs isn't IMHO a thing that can break anything. I'm also going to do some improvements to kdesdk/kmtrace, so it will be hopefully possible to find places where we do so many allocations (even though I doubt we can do much about that). Hmm ... any thoughts? -- Lubos Lunak llunak@suse.cz ; l.lunak@kde.org http://dforce.sh.cvut.cz/~seli