'Re: [quanta-devel] parser in Quanta'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       quanta-devel
Subject:    Re: [quanta-devel] parser in Quanta
From:       Jens Herden <jens () kdewebdev ! org>
Date:       2006-01-16 14:40:21
Message-ID: 200601162140.25678.jens () kdewebdev ! org
[Download RAW message or body]

[Attachment #2 (multipart/signed)]

Hi Andras,

>  I looked at QXmlReader today and tried to figure out what's happening
> in KDOM (I have only an old version from kdenonbeta, and unfortunately
> it is not documented quite well). My first feeling is that both are
> designed to work with complete documents. QXmlReader also has the
> assumption that what is parsed is valid XML or there will be an error
> thrown by the parser. This latter issue might be solved with our own
> QXmlReader derivate class, where we don't throw errors every time,
> instead build elements/nodes in the best way we can from a broken
> document. We might even specify this as a "feature" for the parser.

yes in deed, QXmlReader is only parsing valid code. But my suggestion was not 
to use QXmlReader ;-)
I wanted you to look into this to get an idea what kind of parser I want to 
have and I knew that we can not use QXmlReader. But the whole structure 
around QXmlReader with reader and builder is worth to copy, I think. 
So my suggestion was, like you said, to create our own QXmlReader derived 
class which is very error tolerant and fixes as many problems as possible to 
get a useable KDOM tree for the renderer. 

>  The real problem is parsing only part of the document and merging it
> with the existing DOM tree. Jens asked we if parsing the whole document
> always is slow or not. According to my previous testing it was very
> slow with the current parser and I don't except much better results
> with any new one. 

I do expect better results :-)

> As an extreme case I have a 770KB long, 12000 line 
> HTML file (yes, I got from a user due to an old bugreport). It doesn't
> have any PHP or other parts. On my system (which isn't slow), it takes
> between 500-600ms to parse. If you just modify (add, remove text) the
> document, regardless if it is at the beginning or the end, re-parsing
> takes around 10ms. Now imagine that in case of a slower machine, the
> whole parsing can take at least 3 times as much, while the re-parsing
> is almost around the same time, as the amount to be parsed is small.
> Of course, the current system is not too good and in some cases cannot
> figure out the changed area and requires a whole reparse when typing,
> but this is implementation detail.
>  Just rung Quanta from the command line and watch the debug output. It
> tells you about parsing times.
>  According to my testing, anything above 200ms results in a noticeable
> delay when typing.
>  So I think we need partial parsing and merging with the existing DOM
> tree and I don't see a solution with QXml or KDOM.

Where exactly do you see the problem to feed parts of documents into the 
parser and merge the resulting tree with an existing one? 

>  There is another problem with QXml: it requires an IODevice, a
> ByteArray or a QString a a source. 

AFAIK this is not correct. What we need is a QXmlInputSource and what QT 
offers uses IODevice or Byte Array. But I think we can create our own 
QXmlInputSource that operates on a KTextEditor interface somehow. 

> We usually get the source from the 
> KTextEditor. It is possible to get the whole content of the document as
> a QString and pass to the parser, but it is just a waste of memory.
> Instead it is possible to incrementally read from KTextEditor and feed
> the data to the parser. This is what we do now (line by line).

Line by line or character by character, anyway we should be able to feed a 
parser with this via our custom QXmlInputSource. 

> The current implementation has its limitations though, see below.
>
>  What I suggest is:
> - write our own parser

Yes, please.

> - when creating/manipulatin nodes, call a helper class to create nodes
> - this helper class will create whatever nodes we want (our own or KDOM
> nodes)

This is what I would call the builder :-) 
If we could use the existing builder for KDOM we could save some work.

> - write our own code to detect the changed areas and reparse it and
> merge the DOM trees

This is the point that could become hard from what I know in the moment. But I 
see no real blocker here. 

> I *am* suggesting to write a new parser because:
> - the current one relies on string searching and regular expressions.
> Jens suggested that a char by char parser should be better and faster.

I still do ;-)

> - the current one has strong ties to KTextEditor. We should have a
> common way to feed data, so it can come from KTextEditor, a QString or
> a file (QIODevice)

See my comments above about QXmlInputSource. This is the way to decouple the 
parser from the source. 

> - the current code for re-parsing is ugly and not understandable
> - there is no automated test written to see if a change breaks the
> parser or not.

So still the question how to proceed? We can not use KDOM2 yet and creating 
our new parser with our custom dom tree was the last idea. Do you still want 
to go this way? I am not so sure anymore, because I fear that we make a lot 
of work that will be thrown away when we switch to KDE4. 

Jens

[Attachment #5 (application/pgp-signature)]

_______________________________________________
quanta-devel mailing list
quanta-devel@kde.org
https://mail.kde.org/mailman/listinfo/quanta-devel

[prev in list] [next in list] [prev in thread] [next in thread]