[prev in list] [next in list] [prev in thread] [next in thread]
List: koffice-devel
Subject: KSpread data model (~format storage)
From: Sébastien_de_Menten_de_Horne <sdementen () skynet ! be>
Date: 2005-05-16 22:01:05
Message-ID: 200505170001.06127.sdementen () skynet ! be
[Download RAW message or body]
Hi,
As a long term user and fan of KDE, I follow with interest the evolution of
koffice and more specifically of kspread. I think kspread may be used very
easily as a GUI for some king of applications (finance, simple linear
programming, basic statistic, ) through kpart or DCOP. This GUI offers (or
will offer) efficient tabular data (input/output), charts, plenty of already
implemented functions (sort, sum, ...), easy way to load/save configurations
(stored in worksheets). I developed applications in python with an excel GUI
on windows and I must say it was a tremendous time saver ! But on linux, I
think it could be done more easily with the current stack of technologies.
Well, after this small introduction presenting my motivation, let's start with
my point: kspread internal data mode (or format storage?)
I read with interest the kspread/DESIGN.html in order to understand the
internals of kspread with still a certain level of abstraction (BTW, is there
more documentation of this kind for kspread (ie at the same level of
abstraction)? )
I have some specific and general comments on it.
Now, what I have understood basically is that kspread stores information on
each cell on a cell per cell basis for its content/value/formula (i.e. create
a cell object when it is non empty with this information) and in another way
(ala "raytracing"/depth-buffer) for format storage.
Before reading this file, I had an idea for a efficient spreadsheet data model
that I wanted to test in python. After my reading, I may say that it was more
in the spirit of the format storage depicted in DESIGN.html but for content.
In fact, if I take the example for form storage in DESIGN:
Range | Formatting Piece
Column B | Bold on
Row 2 | Italics on
A1:C5 | Yellow background
I think it can be conceptually extended to data as
Range | Content/Content
Column B | 1
Row 2 | "hello world"
A1:C5 | data_block[0]
D1:D5 | =sum(R[-3,0]:R[-1,0])
This would mean quite naturally:
Column B has values 1 in each cell
Row 2 has values "hello world" in each cell (overwriting old B2=1)
A1:C5 has values taken from a block stored in data_block[0]
D1:D5 has values =sum(R[-3,0]:R[-1,0]) in each cell
Now, I explain the special meaning of data_block[0] and =sum(R[-3,0]:R[-1,0]).
* data_block[0] is a block of data of size 5x3 stored in memory at
data_block[0]. This has the advantage of a very efficient representation as
all element in data_block[0] shares the same type (double, float, string,...)
* =sum(R[-3,0]:R[-1,0]) is a formula expressed in relative position where
R[-3,0] means Relative cell with offset -3 in column and offset 0 in row.
Again the representation is terse. It is also possible to specify absolute
cells with a A[1,1]:A[4,1] notation.
The benefit of this approach are multiples:
* efficient memory usage
* possibility of applying functions to block of data. "=sum(R[-3,0]:R[-1,0])"
can be computed on all the range D1:D5 in parallel instead of cell per cell
* same algorithm for display as the one used for "Format storage",
possibility to use quad-tree for efficient implementation
* operations on sheet like insertion of row/columns can be done by updating
the range information as well as the range definition in formulas
* dependencies computation can be computed again in parallel by looking at
those range/value pairs
Finally, building this value table can be done similarly to format storage by
using manipulators (sequence fill, copy range, etc). Well, I did not get
precisely what is behind manipulators so I may be wrong here.
While this approach may not be worth for small worksheets, I think for big
worksheets it could help a lot. An example of a big worksheet appearing in my
application where excel was the GUI was:
* importing results of simulations (hundred of scenarios with thousands of
statistics in each scenario). Here the data_block idead would save a lot of
memory.
* doing identical computations on each scenario or on each statistics. Again,
formulas were the same for a lot of cells and computing those formulas in
block could help.
Now, a last remark about a topic from this DESIGN.html document: the
dependency manager. Why is there a dependency manager per sheet ?
I mean why not:
a) a dependency manager for a kspread session. This would allow having
dependencies from one file to another (not only between sheets from the same
file)
b) a dependency manager for a file. It would take care of all dependencies in
the same file (sheets and chart)
c) a dependency manager for each cells (this was the first approach with
dependencies stored in each cell)
d) a dependency manager for each block of data (this is related to the
previous discussion if we describe the sheet in a "Format storage" way)
Ok, some alternatives in this question are too generic (a and b), already
tried (c) or hypothetic (d) but my point is more about the arbitrary (at
least to me :-) ) decision of one dependency manager per sheet.
Enough for now, could you be so kind to tell me if:
* I was intelligible in this email :-) I may try to explain again my idea if
it was too obscurve/confusing ...
* this idea has already been beaten to death thousand of times. If it was,
point me to some litterature on the topic
* this idea is valuable to pursue (I may code a prototype in python to test
it more carefully for corner cases).
To finish, thank you, thank you and thank you all for this amazing desktop !!!
Keep up this good work.
Sebastien
PS: could you reply to my address as I am not subscribed to koffice-devel ? TX
_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic