'Re: KSpread data model (~format storage)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: KSpread data model (~format storage)
From:       Ariya Hidayat <ariya () kde ! org>
Date:       2005-05-18 6:24:28
Message-ID: 428ADF9C.8080506 () kde ! org
[Download RAW message or body]

Halo Sébastien,

> Well, after this small introduction presenting my motivation, let's start with 
> my point: kspread internal data mode (or format storage?)

Contributors are always welcomed !

> I read with interest the kspread/DESIGN.html in order to understand the 
> internals of kspread with still a certain level of abstraction (BTW, is there 
> more documentation of this kind for kspread  (ie at the same level of 
> abstraction)? )

You may also want to check koffice-devel archive, as there were many
interesting discussions about this topic inside there.
The code for KSpread itself is not documented well outside the source
code, so your chance is to read the source-code and follow embedded
comments/remarks.

> I think it can be conceptually extended to data as
>   Range    | Content/Content
>   Column B | 1                       
>   Row 2    | "hello world"           
>   A1:C5    | data_block[0]           
>   D1:D5    | =sum(R[-3,0]:R[-1,0])   
> 
> This would mean quite naturally:
> Column B has values 1 in each cell
> Row 2 has values "hello world" in each cell (overwriting old B2=1)
> A1:C5 has values taken from a block stored in data_block[0]
> D1:D5 has values =sum(R[-3,0]:R[-1,0]) in each cell

I slightly disagree with the proposed approach. Here are the arguments
(note: I could be very much wrong, I'm pleased to be proved so!).

First of all, formatting pieces are useful because certain cells can
contain partial format (e.g. Bold only) which should be merged if there
is another format (e.g. Color). Consider that row and column can have
its own formats, it is then the reason to have such storage scheme.

However, this is not the case for value/formula. Once a cell is set to
hold an integer number, putting another number (or string, or formula)
into that cell effectively removes the need to store the old integer number.

>  * efficient memory usage

True, but the case with such data_block needs further research. While
simplyfing the actual value storage, the complexity on the other side
will arise, namely the interface between this and the rest of KSpread
(e.g. for value calculation, display, etc).

Also, formats are designed for display while values are also used for
calculation. So, although internaly formatting pieces are used,
implementation-wise it helps if some kind of cache system is used. For
example, we can store the "final" format for certain cells/ranges
without the need to "compute" it from the formatting pieces that
applied. Since only few, not all, cells are visible at one time, the
cache could e.g. only store those visible (or potentially visible)
cells, not every non-empty cells. The situation is different in the case
for values/formulas because for formula calculation, cell values must be
available fast. If the value still needs to precomputed from all the
content pieces, performance will degrade. Since calculation might
involve any cells in the corresponding sheet, cache is not a viable
option either.

>  * possibility of applying functions to block of data. "=sum(R[-3,0]:R[-1,0])" 
> can be computed on all the range D1:D5 in parallel instead of cell per cell

As with formula, shared formula can be implemented inside the formula
parser/engine. Works need to be done in this area but IMHO this way is
easier to manage because other classes do not need to know about this
and just can use it.

>  * same algorithm for display as the one used for "Format storage", 
> possibility to use quad-tree for efficient implementation

One thing that I still do not like about quad-tree is that insertion and
deletion of rows/columns are expensive. I still could not figure how to
solve this problem. IMHO quad-tree is perfect for classical spatial data
structure, but not so much proper for spreadsheet object model. Again I
can just be totally wrong (note: Gnumeric uses quad-tree).

>  * operations on sheet like insertion of row/columns can be done by updating 
> the range information as well as the range definition in formulas

Does not help much, because you still need to update the corresponding
data_block. And the expensive part will be (this that KSpread does not
do until now) is to update all formulas to reflect the changes.

>  * dependencies computation can be computed again in parallel by looking at 
> those range/value pairs

Quite frankly, I am confused with this one.

> Now, a last remark about a topic from this DESIGN.html document: the 
> dependency manager. Why is there a dependency manager per sheet ? 

This and the following questions (I think) are better explained by Tomas.

BTW, if you manage to produce the prototype in Python, I'd very much
like to see it! As I have said earlier in this list, a working code from
anyone is much much better that just a (supposed to be) nice concept
that I only write...

--
Ariya Hidayat

_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]