[prev in list] [next in list] [prev in thread] [next in thread]
List: koffice-devel
Subject: Re: KSpread data model (~format storage)
From: Tomas Mecir <mecirt () gmail ! com>
Date: 2005-05-17 7:32:08
Message-ID: 492258b105051700322e1baaca () mail ! gmail ! com
[Download RAW message or body]
Hi there !
On 5/17/05, Sébastien de Menten de Horne <sdementen@skynet.be> wrote:
> Hi,
> Well, after this small introduction presenting my motivation, let's start with
> my point: kspread internal data mode (or format storage?)
> I read with interest the kspread/DESIGN.html in order to understand the
> internals of kspread with still a certain level of abstraction (BTW, is there
> more documentation of this kind for kspread (ie at the same level of
> abstraction)? )
Hm, there isn't really much documentation ... And that which exists is
at lower abstraction level (describes classes and methods). Also, the
design.html describes mostly planned features - ie. the new storage or
manipulators are not in yet.
> I have some specific and general comments on it.
>
> Now, what I have understood basically is that kspread stores information on
> each cell on a cell per cell basis for its content/value/formula (i.e. create
> a cell object when it is non empty with this information) and in another way
> (ala "raytracing"/depth-buffer) for format storage.
The depth buffer isn't really implemented yet, only some rough first
steps were done (unless something big happenned in the past two
monthes, when I was busy with other things and didn't track
development much - but I kinda doubt that, as I'd notice something as
big). So, currently, the cell stores everything about itself - no
separate storage.
> In fact, if I take the example for form storage in DESIGN:
> Range | Formatting Piece
> Column B | Bold on
> Row 2 | Italics on
> A1:C5 | Yellow background
I personally wasn't even quite sure whether this would work well
enough, although Ariya was pretty confident that it would - as I've
said, it's all in the thoughts phase as of now.
> I think it can be conceptually extended to data as
> Range | Content/Content
> Column B | 1
> Row 2 | "hello world"
> A1:C5 | data_block[0]
> D1:D5 | =sum(R[-3,0]:R[-1,0])
> Now, I explain the special meaning of data_block[0] and =sum(R[-3,0]:R[-1,0]).
> * data_block[0] is a block of data of size 5x3 stored in memory at
> data_block[0]. This has the advantage of a very efficient representation as
> all element in data_block[0] shares the same type (double, float, string,...)
> * =sum(R[-3,0]:R[-1,0]) is a formula expressed in relative position where
> R[-3,0] means Relative cell with offset -3 in column and offset 0 in row.
> Again the representation is terse. It is also possible to specify absolute
> cells with a A[1,1]:A[4,1] notation.
Hmmm ... Well as far as I can see this, there are two different scenarios:
1. the cells share data
2. the cells share a formula
As for first case, I am rather sceptical. I mean, how many
spreadsheets have exactly the same values in a whole block of cells ?
Probably not many ... Your statistical data, for instance, would hold
a different value in each cell, right ? Hence this approach would lead
to memory usage being even higher than currently, due to all the extra
structures.
The second case is more interesting, though. I can well imagine the
same formula being stored in hundreds of cells, and then several
interesting things could be done to speed things up. But then, having
a depth tree to store this doesn't sound like the best idea ... It
might be better to simply have a container for all the formulas in a
sheet (or in a document, but I'd prefer per-sheet things), keeping
only some sort of index in the cell itself. Interesting ... Although
different from your idea, but oh well ;)
What do you think ?
> The benefit of this approach are multiples:
> * possibility of applying functions to block of data. "=sum(R[-3,0]:R[-1,0])"
> can be computed on all the range D1:D5 in parallel instead of cell per cell
Only if D1..D5 hold the same value, otherwise we end up hving to do
exactly the same thing that we do now - as explained above.
> * operations on sheet like insertion of row/columns can be done by updating
> the range information as well as the range definition in formulas
Yes, that's one of the nice consequences of this - formulas are
separate, thus this is easier.
> * importing results of simulations (hundred of scenarios with thousands of
> statistics in each scenario). Here the data_block idead would save a lot of
> memory.
Would it really ? Each cell would hold a DIFFERENT value, right ?
> * doing identical computations on each scenario or on each statistics. Again,
> formulas were the same for a lot of cells and computing those formulas in
> block could help.
This would only really help reduce memory needed to store the formulas
- we cannot really compute in blocks, as the DATA are not the same for
many cells.
> Now, a last remark about a topic from this DESIGN.html document: the
> dependency manager. Why is there a dependency manager per sheet ?
Because it was the easiest solution at that time :D And I didn't feel
like adding inter-file dependencies. Actually, I still don't quite
understand how are these supposed to work - I mena, what if document A
depends on document B, and then, you only open document B and change
it ? Doc. A would know nothing about the change. Then if you close B,
open A, it could only be updated by auto-opening B again - I don't
like this...
> * this idea is valuable to pursue (I may code a prototype in python to test
> it more carefully for corner cases).
Heh, it is valuable :) Although as you can see, I kinda twisted the
idea to look completely different :DDD
/ Tomas
_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic