From koffice-devel Tue Apr 13 17:26:06 2004 From: Werner Trobin Date: Tue, 13 Apr 2004 17:26:06 +0000 To: koffice-devel Subject: Re: Faster saving Message-Id: <200404131926.06637.trobin () kde ! org> X-MARC-Message: https://marc.info/?l=koffice-devel&m=108187721219304 On Tuesday 13 April 2004 18:54, Thomas Zander wrote: > Slightly off-topic; but interresting are the results that I read about > recently of a new IO system approach in an API I use. > The old ones used a streams based approach; where each byte that comes in > is encoded as soon as it comes in. > The new style has a Buffer object which can contain any number of > characters; and only when thats full is when all the text is converted > (like from QChar or char* to utf-8). > Unlike you expect the speed gains were amazing; it seems that staying in > the processors cache while doing something as trivial as utf-8 encoding is > quite lucrative. Interesting. Any links? [...] > > > > It has taken (rounded values): > > > > QCString: 19s > > > > QTextStream: 55s > > > > KBufferedIODevice( 8KB ): 41s > > > > KBufferedIODevice( 64KB ): 35s > > > > KBufferedIODevice( 128KB ): 34s > > > > > > > > So something is still worng. > > > > > > valgrind-calltree/KCachegrind explains it. It's the unicode->utf8 > > > conversion that takes too much time with the QTextStream solutions, > > > because the codec is triggered for every character - whereas with the > > > QCString solution it's only triggered once, for the whole of the data. > > > > > > I also agree with waiting for Qt 4, we'll optimize then, depending on > > > how it does it. > > > > One more thing on this thread: it's about time we start implementing > > saving in the OASIS format. Since you wrote most of the current code > > relating to this (oowriterexport.*), do you think we should use the > > QDomDocument solution or your "direct writing of text into the ZIP > > device" solution (zipWriteData), i.e. writing litteral XML directly? > > > > Doesn't the latter lead to syntax errors too often? (and encoding > > errors?) It has the advantage that when writing out utf8 or simple > > latin1 stuff (tags/attributes) no char*->QString->char* conversion is > > needed, though (unlike currently). > > > > Hmm, Werner wrote another solution, with a class that has an API almost > > like QDom, but writes out the stuff along the way instead of all at > > once, IIRC. (This has the advantage that it requires much less memory > > than the huge-QDomDocument approach). Werner, what's the status on this? I didn't really think about that after we decided not to try it for the Word filter. From what I still remember you'd need some convenience "fake-dom" classes and a stack. Then you can provide a nice API at less cost than the real QDomDocument. To avoid keeping the full document in memory this fake-dom serializes as much as possible and gets rid of the "real" objects in memory. This will of course be slower than writing the strings directly. While writing that mail I got another idea how to speed up writing of a document: If you look at a representative document (whatever that might be :-) you'll see lots of paragraphs and little extra stuff like tables, headers, footers, footnotes,... I.e. there won't be a lot of bells and whistles in a normal document but a lot of xml paragraph "elements" which are very similar, if viewed on a character level (given that OASIS also uses namespaces as far as I saw). What about having "templates" of paragraphs in memory, already in string form and encoded. When you have to write out another paragraph you just (deep) copy that full string and fill out the attributes. E.g. bleh ... In order to avoid excessive search&replace copying you have to add some spaces for the attribute values (for most of them we know the maximum length). Then you also know the index of the attributes and can replace on the fly. If you have tags of variable length (e.g. font name) you can split the full string in several pieces and process them sequentially. This would probably pay off for tags which are used really really often. If you implement the fake-dom hierarchy you could even hide all those crude hacks behind a sane API by providing a ParagraphElement or so, where you just set the attributes via public setFoo() calls. That said, I unfortunately don't have the time to give it a try for the next few months :-( Ciao, Werner P.S. Don't forget the datatype rope (a heavy duty string, hehe :-) if you are writing some code which concattenates long strings and performs substring operations. It's an STL extension, you can find a description at: http://www.sgi.com/tech/stl/Rope.html _______________________________________________ koffice-devel mailing list koffice-devel@mail.kde.org https://mail.kde.org/mailman/listinfo/koffice-devel