'Re: Tokenizer'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-kafka
Subject:    Re: Tokenizer
From:       Stefan Schimanski <1Stein () gmx ! de>
Date:       2000-10-25 20:53:12
[Download RAW message or body]

> >From my understanding of the conversation, it sounds like Schimmi is
> > talking
>
> about building a layer that first of all contains the output of the DOM,
> and secondly contains further unknown DOM elements.
>
> It seems that the main issue at hand is that there is no way of knowing to
> refresh normal DOM tree when the token buffer DOM tree is updated. Am I
> right?

right. The tokenbuffer btw is no tree, but a linear list of tokens.

> I have a few questions about this:
>
>   - How will the normal DOM tree and the tokenizer DOM fit together to form
> the same tree and representation of the document.

That would be the main issue to solve. One way to do this is that all changes 
are done on the token stream. The idea of the token stream is, that every 
token gets a pointer to the DOM element. So the DOM element can be found 
whose children has to be recreated on manipulation.
If possible I would prefer to drop the idea of another layer because it add 
another level of complexity. If we get the DOM to store _all_ information of 
the html file that would make our life much easier.

>   - when you say there are unknown tags to be used by the tokenizer, do you
> mean that it will parse each line of a <SCRIPT> block for example putting
> each scriping line into the enhanced tokenizer DOM? If this were the case
> we could have a WYSIWYG interface to building scripts also. :-)

I would put the whole script in a single DOM element. But you're right that 
this would be another special DOM element. A script editor of course could 
get the needed information from that.

>   - when you say that the tokenizer will generate tokens, how will this
> work? I have no idea of how the DOM implementation in KHTML works, and I
> assume it tokenizes things also, but could you expand on how this works.
> :-)

Of course. The KHTMLPart passes the text streams of the webpage through a 
tokenizer. This checks character for character divides the streams into tags, 
text, comment and so on. These elementary units are called tokens. For 
example a tag token gets number 0, a text 1, ...
The token stream is passed to the parser. It interpretes the tokens and 
creates the DOM elements for them. It create tree like data structure out of 
the still sequencial token stream.
After a close tag is found the corresponding DOM node is told to create 
render objects. Consequently you have a DOM tree and rendering tree that is 
cross linked with the DOM. The layouting is done by the rendering elements 
while the tag attributes are stored in the DOM elements.

I hope this helps a bit to get an overview about khtml.

Schimmi

-- 
#! /bin/sh
for DVDs in Linux screw the MPAA and ; do dig $DVDs.z.zoy.org ; done | \
   perl -ne 's/\.//g; print pack("H224",$1) if(/^x([^z]*)/)' | gunzip
_______________________________________________
Kde-kafka mailing list
Kde-kafka@master.kde.org
http://master.kde.org/mailman/listinfo/kde-kafka

[prev in list] [next in list] [prev in thread] [next in thread]