[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    [patch] Zone allocator and kcompletion using it
From:       Michael Matz <matz () kde ! org>
Date:       2002-03-07 7:58:45
[Download RAW message or body]

Hi,

while looking for the cause of so many malloc() that we now even have a
whole malloc() replacement, one thing (among others) we found was
KCompletion.  The attached patch reworks it to not use QValueList<>
anymore for the children per node, which quarters the amount of malloc().
I.e.  QValueList<> is not the optimal thing for holding many small things.

The second thing which this patch does is to rework KZoneAllocator to make
it more usefull.  It now can be used as a real obstack (i.e. clearing all
objects allocated since a certain one) or as a general purpose allocator
with a normal deallocate (or as a mixture of both).  Because it does no
compaction it's still much faster than malloc()/free()  (some simple tests
just allocating heaps of small objects and deallocating them again show
that it's 5 times faster than malloc()/free()).  It's still targeted at
allocating many small objects.  Also to not hold memory longer than
necessary nearby objects (as defined by allocation time) should be
deallocated also nearly together.  I.e.  something like allocating 100
things, and then only freeing every second one will not free any memory.
It also potentially uses less memory, because we have no per-object
overhead item as in malloc, which for small objects (say 4 bytes) might
double the amount actually needed.

And the final thing is to make use of that KZoneAllocator in KCompletion
by defining operators new and delete for KCompTreeNode which gets rid of
the last malloc() in the code.

A changed kcompletiontest which just add a big number of strings in
different ways, and clears the completion object in between showed more
than a double speedup (from 4950 to 2100 msec).  I'm using various
versions of this patch since last sunday (or saturday?).

Binary compatibility is not really an issue here, because KZoneAllocator
wasn't used anywhere in kdecvs.  I think it could be used very well in
khtml and similar things dealing with a large number of rather small
objects.  If used correctly it can also greatly simplify code because
there's no need to remember pointer just for freeing them (this only works
for objects without destructors of course), but if not it makes it faster
because of no needed free() calls.  Kind of like a poor mans garbage
collection ;-)


Ciao,
Michael.

["allocation2.diff" (TEXT/PLAIN)]

? Makefile.in.real
? kcharsets.cpp_mm
? kcharsets.diff
? kdecore.diff
? klibloader.diff
Index: kallocator.cpp
===================================================================
RCS file: /home/kde/kdelibs/kdecore/kallocator.cpp,v
retrieving revision 1.4
diff -u -p -r1.4 kallocator.cpp
--- kallocator.cpp	2000/10/09 21:29:00	1.4
+++ kallocator.cpp	2002/03/07 07:21:36
@@ -2,6 +2,7 @@
     This file is part of the KDE libraries
 
     Copyright (C) 1999 Waldo Bastian (bastian@kde.org)
+    Copyright (C) 2002 Michael Matz (matz@kde.org)
 
     This library is free software; you can redistribute it and/or
     modify it under the terms of the GNU Library General Public
@@ -18,46 +19,227 @@
     the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
     Boston, MA 02111-1307, USA.
 */
-//----------------------------------------------------------------------------
-//
-// KDE HTML Widget -- Objects
+
+/* Fast zone memory allocator with deallocation support, for use as obstack
+   or as general purpose allocator.  It does no compaction.  If the usage
+   pattern is non-optimal it might waste some memory while running.  E.g.
+   allocating many small things at once, and then deallocating only every
+   second one, there is a high chance, that actually no memory is freed.  */
 // $Id: kallocator.cpp,v 1.4 2000/10/09 21:29:00 zander Exp $
 
 #include "kallocator.h"
 #include <kdebug.h>
 
-KZoneAllocator::KZoneAllocator(long _blockSize)
-: blockSize(_blockSize), blockOffset(0)
+class KZoneAllocator::MemBlock
+{
+  public:
+    MemBlock(size_t s) : size(s), ref(0), older(0), newer(0)
+      { begin = new char[s]; }
+    ~MemBlock() { delete [] begin; }
+    bool is_in(void *ptr) const {return !(begin > (char *)ptr
+    				          || (begin + size) <= (char *)ptr); }
+    size_t size;
+    unsigned int ref;
+    char *begin;
+    MemBlock *older;
+    MemBlock *newer;
+};
+
+KZoneAllocator::KZoneAllocator(unsigned long _blockSize)
+: currentBlock(0), blockSize(1), blockOffset(0), log2(0), num_blocks(0), 
+  hashList(0), hashSize(0), hashDirty(true)
 {
-    currentBlock = new char[_blockSize];
-    memoryBlocks.append(currentBlock);
+  while (blockSize < _blockSize)
+    blockSize <<= 1, log2++;
+  blockOffset = blockSize;
 }
 
 KZoneAllocator::~KZoneAllocator()
 {
-    while (!memoryBlocks.isEmpty())
-    {
-        char *oldBuffer = (char *) memoryBlocks.take(0);
-        delete [] oldBuffer;
+  unsigned int count = 0;
+  if (hashList) {
+    /* No need to maintain the different lists in hashList[] anymore.
+       I.e. no need to use delBlock().  */
+    for (int i = 0; i < hashSize; i++)
+      delete hashList[i];
+    delete [] hashList;
+    hashList = 0;
+  }
+  MemBlock *next;
+  for (; currentBlock; currentBlock = next) {
+    next = currentBlock->older;
+    delete currentBlock;
+    count++;
+  }
+  if (count > 1)
+    qDebug("zone still contained %d blocks", count);
+}
+
+void KZoneAllocator::insertHash(MemBlock *b)
+{
+  unsigned int adr = ((unsigned int)b->begin) & (~(blockSize - 1));
+  unsigned int end = ((unsigned int)b->begin) + blockSize;
+  while (adr < end) {
+    unsigned int key = adr >> log2;
+    key = key & (hashSize - 1);
+    if (!hashList[key])
+      hashList[key] = new QValueList<MemBlock *>;
+    hashList[key]->append(b);
+    adr += blockSize;
+  }
+}
+
+void KZoneAllocator::addBlock(MemBlock *b)
+{
+  b->newer = 0;
+  b->older = currentBlock;
+  if (currentBlock)
+    b->older->newer = b;
+  currentBlock = b;
+  num_blocks++;
+  /* If we either have no hashList at all, or since it's last construction
+     there are now many more blocks we reconstruct the list.  But don't
+     make it larger than a certain maximum.  */
+  if (hashList && ((num_blocks / 4) > hashSize && hashSize < 64*1024))
+    hashDirty = true;
+  /* Only insert this block into the hashlists, if we aren't going to
+     reconstruct them anyway.  */
+  if (hashList && !hashDirty)
+    insertHash (b);
+}
+
+void KZoneAllocator::initHash()
+{
+  if (hashList) {
+    for (int i = 0; i < hashSize; i++)
+      delete hashList[i];
+    delete [] hashList;
+    hashList = 0;
+  }
+  hashSize = 1;
+  while (hashSize < num_blocks)
+    hashSize <<= 1;
+  if (hashSize < 1024)
+    hashSize = 1024;
+  if (hashSize > 64*1024)
+    hashSize = 64*1024;
+  hashList = new QValueList<MemBlock *> *[hashSize];
+  memset (hashList, 0, sizeof(QValueList<MemBlock*> *) * hashSize);
+  hashDirty = false;
+  for (MemBlock *b = currentBlock; b; b = b->older)
+    insertHash(b);
+}
+
+void KZoneAllocator::delBlock(MemBlock *b)
+{
+  /* Update also the hashlists if we aren't going to reconstruct them
+     soon.  */
+  if (hashList && !hashDirty) {
+    unsigned int adr = ((unsigned int)b->begin) & (~(blockSize - 1));
+    unsigned int end = ((unsigned int)b->begin) + blockSize;
+    while (adr < end) {
+      unsigned int key = adr >> log2;
+      key = key & (hashSize - 1);
+      if (hashList[key]) {
+	QValueList<MemBlock *> *list = hashList[key];
+	QValueList<MemBlock *>::Iterator it = list->begin();
+	QValueList<MemBlock *>::Iterator endit = list->end();
+	for (; it != endit; ++it)
+	  if (*it == b) {
+	    list->remove(it);
+	    break;
+	  }
+      }
+      adr += blockSize;
     }
+  }
+  if (b->older)
+    b->older->newer = b->newer;
+  if (b->newer)
+    b->newer->older = b->older;
+  if (b == currentBlock) {
+    currentBlock = 0;
+    blockOffset = blockSize;
+  }
+  delete b;
+  num_blocks--;
 }
 
 void *
 KZoneAllocator::allocate(size_t _size)
 {
    // Use the size of (void *) as alignment
-   const size_t alignment = sizeof(void *);
-   _size = (_size + alignment - 1) & ~(alignment - 1);   
+   const size_t alignment = sizeof(void *) - 1;
+   _size = (_size + alignment) & ~alignment;   
 
-   if ((long) _size + blockOffset > blockSize)
+   if ((unsigned long) _size + blockOffset > blockSize)
    {
-      currentBlock = new char[blockSize];
-      memoryBlocks.append(currentBlock);
+      if (_size > blockSize) {
+	qDebug("KZoneAllocator: allocating more than %u bytes", blockSize);
+	return 0;
+      }
+      addBlock(new MemBlock(blockSize));
       blockOffset = 0;
-      kdDebug () << "Allocating block #" <<  memoryBlocks.count() << endl;
-   } 
-   void *result = (void *)(currentBlock+blockOffset);
+      //qDebug ("Allocating block #%d (%x)\n", num_blocks, currentBlock->begin);
+   }
+   void *result = (void *)(currentBlock->begin+blockOffset);
+   currentBlock->ref++;
    blockOffset += _size;
    return result;
 }
 
+void
+KZoneAllocator::deallocate(void *ptr)
+{
+  if (hashDirty)
+    initHash();
+
+  unsigned int key = (((unsigned int)ptr) >> log2) & (hashSize - 1);
+  QValueList<MemBlock *> *list = hashList[key];
+  if (!list) {
+    /* Can happen with certain usage pattern of intermixed free_since()
+       and deallocate().  */
+    //qDebug("Uhoh");
+    return;
+  }
+  QValueList<MemBlock*>::ConstIterator it = list->begin();
+  QValueList<MemBlock*>::ConstIterator endit = list->end();
+  for (; it != endit; ++it) {
+    MemBlock *cur = *it;
+    if (cur->is_in(ptr)) {
+      if (!--cur->ref) {
+	if (cur != currentBlock)
+	  delBlock (cur);
+	else
+	  blockOffset = 0;
+      }
+      return;
+    }
+  }
+  /* Can happen with certain usage pattern of intermixed free_since()
+     and deallocate().  */
+  //qDebug("Uhoh2");
+}
+
+void
+KZoneAllocator::free_since(void *ptr)
+{
+  /* If we have a hashList and it's not yet dirty, see, if we will dirty
+     it by removing too many blocks.  This will make the below delBlock()s
+     faster.  */
+  if (hashList && !hashDirty)
+    {
+      const MemBlock *b;
+      unsigned int removed = 0;
+      for (b = currentBlock; b; b = b->older, removed++)
+	if (b->is_in (ptr))
+	  break;
+      if (hashSize >= 4 * (num_blocks - removed))
+        hashDirty = true;
+    }
+  while (currentBlock && !currentBlock->is_in(ptr)) {
+    currentBlock = currentBlock->older;
+    delBlock (currentBlock->newer);
+  }
+  blockOffset = ((char*)ptr) - currentBlock->begin;
+}
Index: kallocator.h
===================================================================
RCS file: /home/kde/kdelibs/kdecore/kallocator.h,v
retrieving revision 1.6
diff -u -p -r1.6 kallocator.h
--- kallocator.h	2001/12/06 13:03:57	1.6
+++ kallocator.h	2002/03/07 07:21:36
@@ -2,6 +2,7 @@
     This file is part of the KDE libraries
 
     Copyright (C) 1999 Waldo Bastian (bastian@kde.org)
+    Copyright (C) 2002 Michael Matz (matz@kde.org)
               
     This library is free software; you can redistribute it and/or
     modify it under the terms of the GNU Library General Public
@@ -26,17 +27,21 @@
 #ifndef KALLOCATOR_H
 #define KALLOCATOR_H
 
-#include <stdio.h>
-#include <qptrlist.h>
+#include <qvaluelist.h>
 
 class KZoneAllocatorPrivate;
 
+
 /**
  * Memory allocator for large groups of small objects.
  * This should be used for large groups of objects that are created and
  * destroyed together. When used carefully for this purpose it is faster
- * and more memory efficient than malloc.
- * @author Waldo Bastian <bastian@kde.org>
+ * and more memory efficient than malloc.  Additionally to a usual obstack
+ * like allocator you can also free the objects individually.  Because it
+ * does no compaction it still is faster then malloc()/free().  Depending
+ * on the exact usage pattern that might come at the expense of some
+ * memory though.
+ * @author Waldo Bastian <bastian@kde.org>, Michael Matz <matz@kde.org>
  * @version $Id: kallocator.h,v 1.6 2001/12/06 13:03:57 faure Exp $
  */
 class KZoneAllocator
@@ -46,7 +51,7 @@ public:
      * Creates a KZoneAllocator object.
      * @param _blockSize Size in bytes of the blocks requested from malloc.
      */
-    KZoneAllocator(long _blockSize = 128*1024);
+    KZoneAllocator(unsigned long _blockSize = 8*1024);
 
     /**
      * Destructs the ZoneAllocator and free all memory allocated by it.
@@ -55,15 +60,70 @@ public:
 
     /**
      * Allocates a memory block.
-     * @param _size Size in bytes of the memory block. Memory is not aligned!
+     * @param _size Size in bytes of the memory block. Memory is aligned to
+     * the size of a pointer.
      */
     void* allocate(size_t _size);
 
+    /**
+     * Gives back a block returned by @ref allocate() to the zone
+     * allocator, and possibly deallocates the block holding it (when it's
+     * empty).  The first @ref deallocate() after many @ref allocate() calls
+     * (or the first at all) builds an internal data structure for speeding
+     * up deallocation.  The consistency of that structure is maintained
+     * from then on (by @ref allocate() and @ref deallocate()) unless many
+     * more objects are allocated without any intervening deallocation, in
+     * which case it's thrown away and rebuilt at the next @ref deallocate().
+     *
+     * The effect of this is, that such initial @ref deallocate() calls take
+     * more time then the normal calls, and that after this list is built, i.e.
+     * generally if @ref deallocate() is used at all, also allocate() is a
+     * little bit slower.  This means, that if you want to squeeze out the last
+     * bit performance you would want to use KZoneAllocator as an obstack, i.e.
+     * just use the functions @ref allocate() and @ref free_since().  All the
+     * remaining memory is returned to the system if the zone allocator
+     * is destroyed.
+     * @param ptr Pointer as returned by @ref allocate().
+     */
+    void deallocate(void *ptr);
+
+    /**
+     * Deallocate many objects at once.
+     * @ref free_since() deallocates all objects allocated after @p ptr, 
+     * @em including @p ptr itself.
+     * @param ptr Pointer as returned by @ref allocate().  It acts like
+     * a kind of mark of a certain position in the stack of all objects,
+     * off which you can throw away everything above that mark.
+     *
+     * The intended use is something along the lines of:
+     * <pre>
+     * KZoneAllocator alloc(8192);
+     * void *remember_me = alloc.allocate(0);
+     * for (int i = 0; i < 1000; i++)
+     *   do_something_with (alloc.allocate(12));
+     * alloc.free_since (remember_me);
+     * </pre>
+     * Note, that we don't need to remember all the pointers to the 12-byte
+     * objects for freeing them.  The @ref free_since() does deallocate them
+     * all at once.
+     */
+    void free_since(void *ptr);
+
 protected:
-    long blockSize;
-    QPtrList<char> memoryBlocks;
-    char *currentBlock;
-    long blockOffset;
+    class MemBlock;
+    typedef QValueList<MemBlock *> MemList;
+    void addBlock(MemBlock *b);
+    void delBlock(MemBlock *b);
+    void insertHash(MemBlock *b);
+    void initHash();
+    MemBlock *currentBlock;
+    unsigned long blockSize;
+    unsigned long blockOffset;
+    unsigned int log2;
+    unsigned int num_blocks;
+    MemList **hashList;
+    unsigned int hashSize;
+    bool hashDirty;
 private:
     KZoneAllocatorPrivate *d;
 };
Index: kcompletion.cpp
===================================================================
RCS file: /home/kde/kdelibs/kdecore/kcompletion.cpp,v
retrieving revision 1.49
diff -u -p -r1.49 kcompletion.cpp
--- kcompletion.cpp	2002/03/03 21:20:25	1.49
+++ kcompletion.cpp	2002/03/07 07:21:36
@@ -546,15 +546,14 @@ void KCompletion::extractStringsFromNode
         return;
 
     // kDebug() << "Beginning: " << beginning << endl;
-    KCompTreeChildren::ConstIterator it;
     const KCompTreeChildren *list = node->children();
     QString string;
     QString w;
 
     // loop thru all children
-    for ( it = list->begin(); it != list->end(); ++it ) {
+    for ( KCompTreeNode *cur = list->begin(); cur ; cur = cur->next) {
         string = beginning;
-        node = *it;
+	node = cur;
 	if ( !node->isNull() )
 	    string += *node;
 
@@ -660,9 +659,12 @@ void KCompletion::doBeep( BeepMode mode 
 KCompTreeNode::~KCompTreeNode()
 {
     // delete all children
-    KCompTreeChildren::Iterator it;
-    for ( it = myChildren.begin(); it != myChildren.end(); ++it )
-        delete *it;
+    KCompTreeNode *cur = myChildren.begin();
+    while (cur) {
+        KCompTreeNode * next = cur->next;
+        delete myChildren.remove(cur);
+        cur = next;
+    }
 }
 
 
@@ -676,14 +678,19 @@ KCompTreeNode * KCompTreeNode::insert( c
 
 	// FIXME, first (slow) sorted insertion implementation
 	if ( sorted ) {
-	    KCompTreeChildren::Iterator it = myChildren.begin();
-	    while ( it != myChildren.end() ) {
-	        if ( ch > *(*it) )
-		    ++it;
-		else
+	    KCompTreeNode * prev = 0;
+	    KCompTreeNode * cur = myChildren.begin();
+	    while ( cur ) {
+	        if ( ch > *cur ) {
+		    prev = cur;
+		    cur = cur->next;
+		} else
 		    break;
 	    }
-	    myChildren.insert( it, child );
+	    if (prev)
+	        myChildren.insert( prev, child );
+	    else
+	        myChildren.prepend(child);
 	}
 
 	else
@@ -705,8 +712,7 @@ void KCompTreeNode::remove( const QStrin
 
     if ( string.isEmpty() ) {
         child = find( 0x0 );
-        delete child;
-        myChildren.remove( child );
+        delete myChildren.remove( child );
         return;
     }
 
@@ -715,8 +721,7 @@ void KCompTreeNode::remove( const QStrin
     if ( child ) {
         child->remove( string.right( string.length() -1 ) );
         if ( child->myChildren.count() == 0 ) {
-            delete child;
-            myChildren.remove( child );
+            delete myChildren.remove( child );
         }
     }
 }
@@ -786,6 +791,79 @@ void KCompletionMatches::removeDuplicate
         }
     }
 }
+
+void KCompTreeNodeList::append(KCompTreeNode *item)
+{
+    m_count++;
+    if (!last) {
+	last = item;
+	last->next = 0;
+	first = item;
+	return;
+    }
+    last->next = item;
+    item->next = 0;
+    last = item;
+}
+
+void KCompTreeNodeList::prepend(KCompTreeNode *item)
+{
+    m_count++;
+    if (!last) {
+	last = item;
+	last->next = 0;
+	first = item;
+	return;
+    }
+    item->next = first;
+    first = item;
+}
+
+void KCompTreeNodeList::insert(KCompTreeNode *after, KCompTreeNode *item)
+{
+    if (!after) {
+	append(item);
+	return;
+    }
+
+    m_count++;
+
+    item->next = after->next;
+    after->next = item;
+
+    if (after == last)
+	last = item;
+}
+
+KCompTreeNode *KCompTreeNodeList::remove(KCompTreeNode *item)
+{
+    if (!first || !item)
+	return 0;
+    KCompTreeNode *cur = 0;
+
+    if (item == first)
+	first = first->next;
+    else {
+	cur = first;
+	while (cur && cur->next != item) cur = cur->next;
+	if (!cur)
+	    return 0;
+	cur->next = item->next;
+    }
+    if (item == last)
+	last = cur;
+    m_count--;
+    return item;
+}
+
+KCompTreeNode *KCompTreeNodeList::at(uint index) const
+{
+    KCompTreeNode *cur = first;
+    while (index-- && cur) cur = cur->next;
+    return cur;
+}
+
+KZoneAllocator KCompTreeNode::alloc(8192);
 
 void KCompletion::virtual_hook( int, void* )
 { /*BASE::virtual_hook( id, data );*/ }
Index: kcompletion_private.h
===================================================================
RCS file: /home/kde/kdelibs/kdecore/kcompletion_private.h,v
retrieving revision 1.12
diff -u -p -r1.12 kcompletion_private.h
--- kcompletion_private.h	2002/01/11 16:12:42	1.12
+++ kcompletion_private.h	2002/03/07 07:21:36
@@ -25,8 +25,30 @@
 #include <ksortablevaluelist.h>
 
 class KCompTreeNode;
-typedef QValueList<KCompTreeNode *> KCompTreeChildren;
 
+#include <kallocator.h>
+
+class KCompTreeNodeList
+{
+public:
+    KCompTreeNodeList() : first(0), last(0), m_count(0) {}
+    KCompTreeNode *begin() const { return first; }
+    KCompTreeNode *end() const { return last; }
+
+    KCompTreeNode *at(uint index) const;
+    void append(KCompTreeNode *item); 
+    void prepend(KCompTreeNode *item); 
+    void insert(KCompTreeNode *after, KCompTreeNode *item);
+    KCompTreeNode *remove(KCompTreeNode *item);
+    uint count() const { return m_count; }
+
+private:
+    KCompTreeNode *first, *last;
+    uint m_count;
+};
+
+typedef KCompTreeNodeList KCompTreeChildren;
+
 /**
  * A helper class for KCompletion. Implements a tree of QChar.
  *
@@ -58,6 +80,9 @@ typedef QValueList<KCompTreeNode *> KCom
  */
 class KCompTreeNode : public QChar
 {
+  friend class KCompTreeNodeList;
+  friend class KCompletion;
+
 public:
     KCompTreeNode() : QChar(), myWeight(0) {}
     KCompTreeNode( const QChar& ch, uint weight = 0 )
@@ -65,15 +90,19 @@ public:
           myWeight( weight ) {}
     ~KCompTreeNode();
 
+    void * operator new( size_t s ) {
+      return alloc.allocate( s );
+    }
+    void operator delete( void * s ) {
+      alloc.deallocate( s );
+    }
+
     // Returns a child of this node matching ch, if available.
     // Otherwise, returns 0L
     inline KCompTreeNode * find( const QChar& ch ) const {
-	KCompTreeChildren::ConstIterator it;
-	for ( it = myChildren.begin(); it != myChildren.end(); ++it )
-	    if ( *(*it) == ch )
-		return *it;
-
-	return 0L;
+      KCompTreeNode * cur = myChildren.begin();
+      while (cur && (*cur != ch)) cur = cur->next;
+      return cur;
     }
     KCompTreeNode *	insert( const QChar&, bool sorted );
     void 		remove( const QString& );
@@ -90,20 +119,22 @@ public:
 	return &myChildren;
     }
     inline const KCompTreeNode * childAt(int index) const {
-	return myChildren[index];
+	return myChildren.at(index);
     }
     inline const KCompTreeNode * firstChild() const {
-	return myChildren.first();
+	return myChildren.begin();
     }
     inline const KCompTreeNode * lastChild()  const {
-	return myChildren.last();
+	return myChildren.end();
     }
 
 private:
     uint myWeight;
-    KCompTreeChildren	myChildren;
-
+    KCompTreeNodeList	myChildren;
+    KCompTreeNode *next;
+    static KZoneAllocator alloc;
 };
+
 
 
 // some more helper stuff


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic