[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    DOM tree with on-demand loading
From:       Ariya Hidayat <ariya () kde ! org>
Date:       2005-11-24 11:57:28
Message-ID: ba035dd10511240357w2a346b76wed06887912603816 () mail ! gmail ! com
[Download RAW message or body]

I have committed the very draft version of DOM implementation which
supports on-demand loading. It is in
lib/kofficecore/koxmlreader.{h,cpp} with corresponding test in
lib/kofficecore/tests/koxmlreadertest.cpp. As you can see, I try to
follow QDom's API as close as possible to ease the migration.

The performance of this implementation is unfortunately still not
acceptable. This is because I still use Qt XML parser to reparse part
of the XML document during on-demand loading. For OpenDocument format,
it places a quite big burden because namespace must be handled
correctly and so many repetitive parsing thereby occurs.

KoXmlDocument has setFastLoading() function (this is of course not
available in QDomDocument) which can enable 'fast loading', means
parsing and loading everything at once just like in QDom. If it is not
set, then some parts of the DOM tree are reconstructed on-the-fly.

In the test program, function testLargeOpenDocumentSpreadsheet()
checks the performance when handling very large spreadsheet. The
parsing time - here I define as the time spent in
KoXmlDocument::setContent() - is doubled compared to the case if fast
loading is enabled.  This is because there is the need to construct
and fill the buffer necessary for later on-demand reparsing, it is
definitely the cause for the slow down.

The iterating time - the time spent to iterate all cells in the
spreadsheet - was in my test can be as 10 times slower. Based on
initial profiling, most of the time is wasted on running the XML
parser againsts part of the document which needs to be reparsed.

Note that I have not yet compare the memory usages, but I have noticed
that fast loading requires up to 4 times more bytes. Because I choose
to use QString in the parsing buffer (to avoid expensive round-trip
UTF8-QString-UTF8 conversions), memory eaten by the buffer is indeed
quite large. Added to this is also the fact that I need to put
namespaces everywhere, seems redundant but can't be avoided because I
use stock, unmodified Qt's XML parser.

I believe the XML parsing stuff can be improved, for example to
exploit UTF8 encoding mandated by OpenDocument format. Also, parsing
directly from the ZIP storage would be very good because we can as
well uncompress and read one small chunk at a time from the storage
instead of uncompress the whole file and then read it as one very big
block.

However, I'd rather now continue to optimize the reparsing method. I
want to get rid of reusing Qt's XML reader over and over. During the
first phase of the parsing, the compact representation of all nodes
should be reconstructed. Later on, when some KoXmlElements (and
friends) needs to be accessed, it is enough to build them only from
the compacted version. XML files in OpenDocument format can be
compressed relatively well because many repeated element names,
prefixes and namespaces so I also plan to just index them to save more
space.

Aside from the necessary further optimization, the API should be
considered final, maybe some touches here and there but no major
change is foreseen. So I propose that conversion from QDom should be
started. There is #define KOXML_USE_QDOM in koxmlreader.h which will
just map KoXmlNode to QDomNode, KoXmlElement to QDomElement, as so on.
When this is defined, renaming one by one QDom classes to the
corresponding KoXMl classes actually has no effect, this will be
helpful as whole KOffice will still build and work as usual.

When KoXml is ready and has acceptable performance, the KOXML_USE_QDOM
can be just undefined and KOffice can be recompiled to use the KoXml
classes. And suppose that I do not finish the optimization before beta
release, then we would be still using QDom anyway, however the
s/QDom/KoXml makes the source code ready for the conversion in the
future. So if there is no objection, then I'll put this on my TODO.

Comments are warmly welcomed.


--
http://www.google.com/search?q=ariya+hidayat&btnI
_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic