[prev in list] [next in list] [prev in thread] [next in thread] 

List:       jabber-jdev
Subject:    Re: [jdev] parsing xml (xmpp) with ruby
From:       "Michal 'vorner' Vaner" <vorner () ucw ! cz>
Date:       2008-10-01 15:49:27
Message-ID: 20081001154927.GA12022 () tarantula ! kolej ! mff ! cuni ! cz
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


Hello

On Wed, Oct 01, 2008 at 11:33:44AM -0400, Eric Will wrote:
> On Wed, Oct 1, 2008 at 11:15 AM, Michal 'vorner' Vaner <vorner@ucw.cz> wrote:
> 
> > If you take <stream thenamespace etc><first stanza/> and put it into
> > first parser and then <second stanza/><third stanza> to second and
> > </thind stanza> into another, then you get mess and not data. Or do you
> > reuse it in some other way I do not get?
> 
> I'm using a SAX parser. It doesn't care about the structure of the
> overall document. I build the nodes by myself, a tag at a time.

You don't get it. Sax does not need to load the whole document in
memory. But it needs some information from the parent nodes (like depth,
namespace declarations, etc). You can't start parsing from the middle.

> > When a stanza gets split into two chunks, you get even more mess.
> 
> I handle this at the moment, but not in the best way. When my parser
> gets to a partial stanza it reads and processes up to the partial
> part, it does one of two bad things. The first one is when i get half
> a tag or something, and it raises an exception saying it's invalid
> XML. The second one is when it lands in the middle of an open tag, but
> everything is well-formed, but there's no closing tag. In this case it
> parses as far as it can, but without closing tags (which is where I
> fire my events) it doesn't DO anything, so it appears to ignore it...
> I'm not sure how to fix this.

That is the „more mess" I talk about. You need to set up the parser so
it does not expect to reach the end of document and will wait for next
data feed.

> > This is my code when data come. It is C++ and Qt, but you might see:
> >
> > source.setData( text );
> > reader.parseContinue();
> 
> REXML doesn't have this. There's no way to change the source except to
> make a new parser instance.

I do not change the source. I just fill the source with more data and
tell the parser it can continue. reader is the parser.

If your parser can not do something like this, then you are doomed and
it won't work. At all (if it sometimes pretends to work, you are unlucky
enough not to give you straight evidence it is broken).

-- 
If it works, fix it.

Michal 'vorner' Vaner

[Attachment #5 (application/pgp-signature)]

_______________________________________________
JDev mailing list
FAQ: http://www.jabber.org/discussion-lists/jdev-faq
Forum: http://www.jabberforum.org/forumdisplay.php?f=20
Info: http://mail.jabber.org/mailman/listinfo/jdev
Unsubscribe: JDev-unsubscribe@jabber.org
_______________________________________________


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic