[prev in list] [next in list] [prev in thread] [next in thread] 

List:       quanta-devel
Subject:    Re: [quanta-devel] more ideas for the parser
From:       Jens Herden <jens () kdewebdev ! org>
Date:       2006-01-23 13:59:53
Message-ID: 200601232059.58617.jens () kdewebdev ! org
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


> > The configuration of the parser should be in external files, so that
> > we can change the behavior of the parser at any time.
>
> Isn't this what he have now in description.rc? Of course, the
> configuration syntax will be different, but the idea is the same.

Yes, right. I want the external configuration of the parser for all our 
supported formats. 

> > We have to think about actions that can happen on state-in and
> > state-out.
>
> I think we should have states like:
> - XML tag start
> - XML tag end
> - character data
> - special area start
> - special area end

Reading this and the Umbrello document I want to say that there is a 
distinction between a state and the possible actions that happen in a state. 
There are actions that will happen when the parser enters the state and others 
when the parser leaves a state. I am not sure yet if we really need both. 
There can be also more than one action that needs to be done at one time. 
E.g. emitting the tag name to the builder and clearing the tagname buffer.

> > It is probably possible to define a set of actions and
> > hardcode them. The external configuration would refer to the internal
> > coded action. 
> > Some examples:
> > - add the incoming character to an internal buffer
> > - clear the internal buffer
> > - call one of the builder functions to create the DOM tree
>
> I don't get, what do you want to do with these? Describe in the external
> files? If yes, how did you imagine it?

Please have a look at kdevquanta/quantacore/parsers/comparator.cpp/.h
This is my first attempt to hardcode the possible compare functions but make 
them configurable from a file. In the file you just write an id and when the 
parser reads the file it will translate the id into a pointer for the correct 
function. This function will get used to compare the incoming character and 
do something according to the result. 
This implementation allows us to avoid any case statement during parsing and 
just use the function pointer to jump to the correct place. The pointer 
should go into a data structure that describes the possible transitions in 
one state. 

In the same way we could be able to create a set of actions that would be 
called during state-in or/and state-out. 

>
> > We have to think about conditions for comparing the character that
> > comes in. I immediately have this in mind:
> > - compare with exactly one character
> > - compare with a set of characters
> > - check if the character fits in a class of characters, like
> > whitespaces
>
> Makes sense.

See above.

> > I have another one that I want to avoid whenever possible:
> > - check if it match a regular expression
>
> But you know that comparing with a set of characters of if it's in a
> class of characters is usually done with regular expressions?

Yes I know but I just wanted to say that we should avoid regex whereever 
possible.


> Umbrello? It has a state diagram creating module as well
> (Diagram->New->State diagram).

Nice, I did not know this. 
Looking into you file I think we are on the right track. Of course some things 
are missing, like namespace support. But I think we should continue to write 
a diagram for our parsers. We could improve it even more if we clearly 
describe the in- and out-actions for each state. 

There is a problem with our special areas. The start of a special area can be 
in different states of the XML parser. For a correct result we need a kind of 
state-stack to remember where we want to return when the special area has 
ended. Actually I believe we have to make a solution that is on top of the 
XML parser and that takes over whenever a special area has started and 
returns to the XML parser in the state where it took over. 

I did not have a lot of time today, I wanted to write more. But maybe this is 
some input to continue to think. 

Jens

[Attachment #5 (application/pgp-signature)]

_______________________________________________
quanta-devel mailing list
quanta-devel@kde.org
https://mail.kde.org/mailman/listinfo/quanta-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic