[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: design discussion: caching QTextFormat for faster text loading
From:       Jos van den Oever <jos.van.den.oever () kogmbh ! com>
Date:       2010-08-16 9:59:21
Message-ID: 201008161517.22941.jos.van.den.oever () kogmbh ! com
[Download RAW message or body]

On Friday, August 13, 2010 17:00:53 pm zander@kde.org wrote:
> On Friday 13. August 2010 11.59.10 Jos van den Oever wrote:
> > As you may have noticed, I'm working on speeding up KOffice at the
> > moment. The  biggest bottleneck for the loading of documents at the
> > moment is our styling system. In this mail, I want to explain why this
> > is the case and give a suggestion for improving the situation.
> 
> The idea sounds intriguing :)  I like it.
> 
> There is something extra that you might be very happy to learn as this will
> make things a lot easier for you.
> 
> In QTextDocument each unique QTextFormat gets put in a list and gets an
> index that will never change over the lifetime of the QTextDocument.
> See QTextDocument::allFormats().
QTextDocument::allFormats() returns a QVector<QTextFormat>.

The QTextDocument says:

    Each document element is described by an associated format object. Each
    format object is treated as a unique object by QTextDocuments, and can be
    passed to objectForFormat() to obtain the document element that it is
    applied to.

Each QTextFormat, which is basically a QMap<uint,QVariant>, contains an index 
over which the object it applies to can be retrieved. This makes each 
QTextFormat unique even though many paragraphs can have the same formatting. 
So comparing QTextFormat for two paragraphs will return false if the formats 
are associated with a different paragraph. If the the format is not associated 
with a QTextDocument, the property ObjectIndex is not set and the formats 
would be seen as equal.
Comparing QTextFormat objects is expensive. This is sped up somewhat by the 
internal hash Qt uses for them, but as soon as one property is changed, the 
hash is invalid. So taking two formats from a QTextDocument and comparing them 
after removing the ObjectIndex property is always a full loop over all 
properties in the two QTextFormat instances and hence always expensive.

The fact that the property ObjectIndex is part of the QTextFormat is a big 
design problem in Qt, because now two styles on different paragraph always take 
separate memory, even though the are the same (apart from the ObjectIndex 
value).

> This would mean that KoParagraphStyle::applyStyle and
> KoCharacterStyle::applyStyle could get some extra code to detect the ideal
> case; which is almost always hit in an empty document. The ideal case is
> when we apply the style on a block without style.

I do not see how to check that. Can you tell me how I can check whether a 
QTextCursor or QTextBlock has no QTextFormat associated with it?
The most relevant signature are these:
  void KoCharacterStyle::applyStyle(QTextCursor *selection) const
  void KoCharacterStyle::applyStyle(QTextBlock &block) const
  void KoCharacterStyle::applyStyle(QTextCursor *selection) const
  void KoParagraphStyle::applyStyle(QTextBlockFormat &format) const
  void KoParagraphStyle::applyStyle(QTextBlock &block, bool applyListStyle) 
const

In my opinion, any call passing a QTextCursor, QTextBlock or a QTextFormat 
where ObjectIndex is set is problematic. The Ko*Style classes should convert 
from a style to a QTextFormat which is then set on a part of the 
QTextDocument. That way all the performance problems of QTextFormat are 
avoided and the QTextFormat instances can be cached during loading.

> So a block has a format set to index (QTextBlock::blockFormatIndex) that is
> known to be 'empty' and we turn it into a know other format. The tricky
> part is that this is per QTextDocument instance.
> But I bet its easy to store a little structure inside the style keeping a
> QTextDocument pointer, an int of the 'before' format Id and the new format.
> Which you just apply.
The style should not have a pointer to QTextDocument in my opinion. The style 
is a representation of the ODF style element that applies to the entire ODF 
document and giving it private information pertaining to one specific text 
shape breaks that.

In fact, to have a cleaner design and allow for better predictability and 
caching, the Ko*Style classes should only work with QTextFormat objects and 
not with any other of the QText* classes in their api and implementation.
Then the Ko*Style classes would give back QTextFormat objects that are applied 
to the relevant block, instead of letting the styles merge the style they 
represent with the style that is present at the particular position.

> That would be fast as all you need to do for knowing you can apply the
> format is to do one integer comparison.
I do not see how this would work. Can you explain it with some code?

Cheers,
Jos

-- 
Jos van den Oever, software architect
+49 391 25 19 15 53
http://kogmbh.com/legal/
_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic