'Re: Tokenizer'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-kafka
Subject:    Re: Tokenizer
From:       Jono Bacon <f9808590 () wlv ! ac ! uk>
Date:       2000-10-26 0:11:53
[Download RAW message or body]

On Wednesday 25 October 2000 20:53, Stefan Schimanski wrote:
> > >From my understanding of the conversation, it sounds like Schimmi is
> > > talking
> >
> > about building a layer that first of all contains the output of the DOM,
> > and secondly contains further unknown DOM elements.
> >
> > It seems that the main issue at hand is that there is no way of knowing
> > to refresh normal DOM tree when the token buffer DOM tree is updated. Am
> > I right?
>
> right. The tokenbuffer btw is no tree, but a linear list of tokens.
>
> > I have a few questions about this:
> >
> >   - How will the normal DOM tree and the tokenizer DOM fit together to
> > form the same tree and representation of the document.
>
> That would be the main issue to solve. One way to do this is that all
> changes are done on the token stream. The idea of the token stream is, that
> every token gets a pointer to the DOM element. So the DOM element can be
> found whose children has to be recreated on manipulation.
> If possible I would prefer to drop the idea of another layer because it add
> another level of complexity. If we get the DOM to store _all_ information
> of the html file that would make our life much easier.

Another option is if we are using a QObject event layer - when the user 
triggers an event on the layer, the coordinates are passed in the method and 
we can determine where in the DOM tree the action should affect. We can then 
identify if we need to update the tokenizer or not.

> >   - when you say there are unknown tags to be used by the tokenizer, do
> > you mean that it will parse each line of a <SCRIPT> block for example
> > putting each scriping line into the enhanced tokenizer DOM? If this were
> > the case we could have a WYSIWYG interface to building scripts also. :-)
>
> I would put the whole script in a single DOM element. But you're right that
> this would be another special DOM element. A script editor of course could
> get the needed information from that.

Maybe we should ensure the plugin interface can have a connection with the 
DOM, and maybe there is a specific plugin to visually edit scripts and load 
scripts into a custom DOM. We could theoretically build a DOM for each type 
of content; e.g - PHP, JavaScript etc.

> >   - when you say that the tokenizer will generate tokens, how will this
> > work? I have no idea of how the DOM implementation in KHTML works, and I
> > assume it tokenizes things also, but could you expand on how this works.
> >
> > :-)
>
> Of course. The KHTMLPart passes the text streams of the webpage through a
> tokenizer. This checks character for character divides the streams into
> tags, text, comment and so on. These elementary units are called tokens.
> For example a tag token gets number 0, a text 1, ...
> The token stream is passed to the parser. It interpretes the tokens and
> creates the DOM elements for them. It create tree like data structure out
> of the still sequencial token stream.
> After a close tag is found the corresponding DOM node is told to create
> render objects. Consequently you have a DOM tree and rendering tree that is
> cross linked with the DOM. The layouting is done by the rendering elements
> while the tag attributes are stored in the DOM elements.

What do you mean by a 'close tag'. Is this the EOF marker at the end of the 
document?

	Jono

-- 
Jono Bacon - jono@kde.org
KDE/Qt Developer - Founder of Linux UK
_______________________________________________
Kde-kafka mailing list
Kde-kafka@master.kde.org
http://master.kde.org/mailman/listinfo/kde-kafka

[prev in list] [next in list] [prev in thread] [next in thread]