[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    KSpread - algorithm for fast and memory-saving storage offormatting
From:       "Pascal A. Niklaus" <Pascal.Niklaus () unibas ! ch>
Date:       2003-04-23 8:21:18
[Download RAW message or body]

Hi,

I had a browse through KSpread's source tree and found the notice on the 
discussion on how to store formatting information so that it doesn't 
consume much memory and access is still fast.

After some thinking I came up with an algorithm to store the formatting 
information so that it can be accessed very quickly, but doesn't have to 
be stored in the cell (not even a pointer). The concept is to store 
formatting in a list/array of rectangular regions with a specific 
formatting, and to use a quaternary tree structure to find in which 
region a specific cell lies. Even for large worksheets, the specified 
format can be found with a very small number of integer comparisions, 
(1) because maximum tree height increases only with log(size) and (2) 
because the tree is extremely short for regions without complex formatting.

1) Store the formatting information as a list or array of rectangular 
regions. This is the only info which needs to be saved to the file.

2) Formatting info is not stored in the cell structure itself, but a 
tree structure is used to find the formatting belonging to a specific cell.

Basically, the spreadsheet is recursively subdivided into quadrants with 
side lengths which are powers of 2.  It's a bit like the "JPEG 
compression pattern".

A node looks like this

class CTreeNode
{
    CTreeNode *m_pQuadrant[4];    // topleft, topright, bottomleft, 
bottomright
    QPtrList<CRectRegion *> *pFormats;
};

Iterating through the tree works like this:

- check in which quadrant the cell is.
    If the pointer to this quadrant isn't NULL, move to the next tree node.
    Else check if cell is in list of regions in this node

For optimal speed, the length of the list in *pFormats and the height of 
the tree have to be balanced. For example, the rule could be to further 
subdivided a node into four extra quadrants if pFormats->length()>10.

What we have to store separately is the quadrant size for the first node 
(=root).

The advantages of this solution are:

1) For areas without complex formatting, the tree is very short

2) Search time for the format of a given cell increases only with 
log(sheetsize). If only the formats of the visible sheet section are to 
be looked up, search time is independent of sheet size because we can 
start searching at a node far down the tree (which is just larger than 
the "viewport").

3) If the worksheets (say: area containing format info) grows, the tree 
is extended by simply inserting an extra node at the top

4) Finding the proper quadrant can be done with bit shift operations and 
is super-fast.
for example:

row=0111
col=1010

indices into proper quadrants are (read bits vertically) 01 (node 1), 10 
(node 2), 11 (node 3), 10 (node 4)

This means we can walk down the tree by shifting bits in row/col left, 
look up if this node exists, if yes, get this pointer and repeat until 
ptr is NULL.
in pseudo-assembler:

CLR quadrant
LSL row
ROL quadrant
LSL col
ROL quadrant  ; quadrant is now 01

;for the next nodes it would be:
;repeat again: quadrant is now 10
;repeat again: quadrant is now 11
;repeat again: quadrant is now 10

Then we check if there's a region list... if NULL --> no formats stored, 
else iterate through list and find region we are in. If no matching 
region found --> no format specified.

What do you think?

Was I able to communicate my idea clearly or should I post an example 
for a simple spreadsheet?

If this is considered a good idea, I'd be willing to implement it.

Pascal







_______________________________________________
koffice-devel mailing list
koffice-devel@mail.kde.org
http://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic