[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: KSpread data model (~format storage)
From:       Tomas Mecir <mecirt () gmail ! com>
Date:       2005-05-17 7:32:08
Message-ID: 492258b105051700322e1baaca () mail ! gmail ! com
[Download RAW message or body]

Hi there !

On 5/17/05, Sébastien de Menten de Horne <sdementen@skynet.be> wrote:
> Hi,
> Well, after this small introduction presenting my motivation, let's start with
> my point: kspread internal data mode (or format storage?)
> I read with interest the kspread/DESIGN.html in order to understand the
> internals of kspread with still a certain level of abstraction (BTW, is there
> more documentation of this kind for kspread  (ie at the same level of
> abstraction)? )

Hm, there isn't really much documentation ... And that which exists is
at lower abstraction level (describes classes and methods). Also, the
design.html describes mostly planned features - ie. the new storage or
manipulators are not in yet.

> I have some specific and general comments on it.
> 
> Now, what I have understood basically is that kspread stores information on
> each cell on a cell per cell basis for its content/value/formula (i.e. create
> a cell object when it is non empty with this information) and in another way
> (ala "raytracing"/depth-buffer) for format storage.

The depth buffer isn't really implemented yet, only some rough first
steps were done (unless something big happenned in the past two
monthes, when I was busy with other things and didn't track
development much - but I kinda doubt that, as I'd notice something as
big). So, currently, the cell stores everything about itself - no
separate storage.

> In fact, if I take the example for form storage in DESIGN:
>   Range    | Formatting Piece
>   Column B | Bold on
>   Row 2    | Italics on
>   A1:C5    | Yellow background

I personally wasn't even quite sure whether this would work well
enough, although Ariya was pretty confident that it would - as I've
said, it's all in the thoughts phase as of now.

> I think it can be conceptually extended to data as
>   Range    | Content/Content
>   Column B | 1
>   Row 2    | "hello world"
>   A1:C5    | data_block[0]
>   D1:D5    | =sum(R[-3,0]:R[-1,0])

> Now, I explain the special meaning of data_block[0] and =sum(R[-3,0]:R[-1,0]).
>  * data_block[0] is a block of data of size 5x3 stored in memory at
> data_block[0]. This has the advantage of a very efficient representation as
> all element in data_block[0] shares the same type (double, float, string,...)
>  * =sum(R[-3,0]:R[-1,0]) is a formula expressed in relative position where
> R[-3,0] means Relative cell with offset -3 in column and offset 0 in row.
> Again the representation is terse. It is also possible to specify absolute
> cells with a A[1,1]:A[4,1] notation.

Hmmm ... Well as far as I can see this, there are two different scenarios:
1. the cells share data
2. the cells share a formula
As for first case, I am rather sceptical. I mean, how many
spreadsheets have exactly the same values in a whole block of cells ?
Probably not many ... Your statistical data, for instance, would hold
a different value in each cell, right ? Hence this approach would lead
to memory usage being even higher than currently, due to all the extra
structures.

The second case is more interesting, though. I can well imagine the
same formula being stored in hundreds of cells, and then several
interesting things could be done to speed things up. But then, having
a depth tree to store this doesn't sound like the best idea ... It
might be better to simply have a container for all the formulas in a
sheet (or in a document, but I'd prefer per-sheet things), keeping
only some sort of index in the cell itself. Interesting ... Although
different from your idea, but oh well ;)
What do you think ?

> The benefit of this approach are multiples:
>  * possibility of applying functions to block of data. "=sum(R[-3,0]:R[-1,0])"
> can be computed on all the range D1:D5 in parallel instead of cell per cell
Only if D1..D5 hold the same value, otherwise we end up hving to do
exactly the same thing that we do now - as explained above.

>  * operations on sheet like insertion of row/columns can be done by updating
> the range information as well as the range definition in formulas
Yes, that's one of the nice consequences of this - formulas are
separate, thus this is easier.

>  * importing results of simulations (hundred of scenarios with thousands of
> statistics in each scenario). Here the data_block idead would save a lot of
> memory.
Would it really ? Each cell would hold a DIFFERENT value, right ?

>  * doing identical computations on each scenario or on each statistics. Again,
> formulas were the same for a lot of cells and computing those formulas in
> block could help.
This would only really help reduce memory needed to store the formulas
- we cannot really compute in blocks, as the DATA are not the same for
many cells.

> Now, a last remark about a topic from this DESIGN.html document: the
> dependency manager. Why is there a dependency manager per sheet ?

Because it was the easiest solution at that time :D And I didn't feel
like adding inter-file dependencies. Actually, I still don't quite
understand how are these supposed to work - I mena, what if document A
depends on document B, and then, you only open document B and change
it ? Doc. A would know nothing about the change. Then if you close B,
open A, it could only be updated by auto-opening B again - I don't
like this...

>  * this idea is valuable to pursue (I may code a prototype in python to test
> it more carefully for corner cases).
Heh, it is valuable :) Although as you can see, I kinda twisted the
idea to look completely different :DDD

/ Tomas
_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic