[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: Interesting QDomDocument::setContent variant
From:       David Faure <faure () kde ! org>
Date:       2004-06-07 18:36:18
Message-ID: 200406072036.18748.faure () kde ! org
[Download RAW message or body]

On Monday 07 June 2004 20:05, Thomas Zander wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Monday 07 June 2004 18:53, David Faure wrote:
> > For instance:
> >       <text:p>foo</text:p>
> >       <text:p>bar</text:p>
> > leads to the firstChild()/nextSibling() loop reporting a tag with
> > tagName == "". Handling this would mean handling such 'empty tag'
> > elements everywhere in the code
> 
> Or creating a little 'wrapper' API for Qts parser that removes all illegal 
> elements from its output.

That's a solution indeed, but a slightly confusing one for anyone who knows QDom
already (yet another api to learn), and also not an easy one to implement.
Specifics below.

> Now that I think about it; the code should already handle ANY unknown tag 
> gracefully anyway; if I were to put in extra tags in a document the nature 
> of XML would mean its totally ignored by anyone reading it.
> If it doesn't then it should get fixed anyway.

Let's not confuse things here. We already skip unknown elements. But adding
a *text node* between two elements (like text:p elements or anywhere else that is not
inside a text:p itself), is not allowed by the RelaxNG schema anyway,
so we don't have to care for that case.

My point is that
  <text:p>foo</text:p>
  <text:p>bar</text:p>
is really just two paragraphs following each other. There is NO text node between them.
Having to care for a reported (whitespace-only) text node between them is a bug IMHO
(but see below :)

> I mean; if I put XML comments in a file; will that break loading the file 
> into KWord? That would be really bad..

Waldo's recent commits proved that it's easy to fall into that trap indeed,
but I just checked and although the kword-1.3 loading code has that bug in quite
a few places, the OASIS loading code doesn't have it yet.

Ah, hmm, you are right. Even though we correctly iterate over QDomNodes, we have 
to check if the conversion to a QDomElement worked. So in fact handling the above
"bug" (wrong textdata elements reported) ensures that we also handle comments
in the XML we're parsing.... hmm, that's possibly an argument for keeping the indentation
in the files...

Everyone: see the bottom of http://developer.kde.org/documentation/other/mistakes.html
for what's correct and what's not.

> I just searched the QDomNode and QDomElement classes API docs and was 
> highly amazed by the lack of a getter to get a child node by name. Using 
> that kind of getter proves the whole strength of DOM based parsing!
> Something like:
> QString paragText = htmlElement->child("body")->child("p")->childText();
See QDomNode::namedItem()

We use this in many places. But body.namedItem("p") is of course nonsense,
there is a number of paragraphs in a body text, hence the need to iterate over
all the children of the body - and that's where the iteration has to be done properly.

> > , or getting rid of all newlines and 
> > indentation in the generated XML files (now I see why the OO files have
> > none!).
> I doubt that; its hard work to create nicely indented XML files; my guess 
> is that they were just lazy :)
I disagree. It really makes the (computer-) parsing simpler.
When it comes to looking at a nicely indented XML file, it's very simple:
"xmllint --format content.xml". This is what I use all the time to read XML files
generated by OO.

Hmm now I'm undecided. I'll fix the problems when encountering non-elements,
as triggered by this whitespace-reporting parser... If things work ok after that
I guess I'll leave the indentation then.

Thanks for the input.

-- 
David Faure, faure@kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
_______________________________________________
koffice-devel mailing list
koffice-devel@mail.kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic