[prev in list] [next in list] [prev in thread] [next in thread]
List: koffice-devel
Subject: Re: Interesting QDomDocument::setContent variant
From: David Faure <faure () kde ! org>
Date: 2004-06-07 16:53:36
Message-ID: 200406071853.36220.faure () kde ! org
[Download RAW message or body]
On Monday 16 February 2004 13:49, Nicolas Goutte wrote:
> On Monday 16 February 2004 12:33, David Faure wrote:
> > On Monday 16 February 2004 12:28, Nicolas Goutte wrote:
> > > On Monday 16 February 2004 11:39, David Faure wrote:
> > > > On Monday 16 February 2004 11:33, Nicolas Goutte wrote:
> > > > > On Monday 16 February 2004 11:17, David Faure wrote:
> > > > > > The OASIS format has a special tag for whitespace to ensure
> > > > > > whitespace is preserved. This is more reliable than using real
> > > > > > whitespace and depending on the XML parser to respect it.
> > > > >
> > > > > Something like:
> > > > > <text:span> </text:span>
> > > > > too?
> > > >
> > > > No, <text:s text:c="4"> for 4 spaces.
> > >
> > > Yes, but you do not write:
> > > <text:span><text:s text:c="1"/></text:span>
> > > but
> > > <text:span> </text:span>
> >
> > Yes, because one space is always preserved, the problem is that 2 or more
> > spaces can be collapsed as a single one, no?
>
> No, the problem is the feature:
> http://trolltech.com/xml/features/report-whitespace-only-CharData
> of Qt's SAX parser.
>
> (See QXmlSimpleReader in Qt's doc.)
>
> With it on (which QDomDocument does by default), if white space is the only
> content of an element, it is not reported (i.e. ignored.)
>
> >
> > Or do you see problems with XML parsers (which ones?) not seeing the single
> > space in <text:span> </text:span>?
>
> Try it with a normal QDomDocument::setContent
>
> QDomDocument doc;
> doc.setContent( QCString( "<test> </test>" ) );
> qDebug("%s\n", doc.toCString().data() );
>
> And you get:
> <test/>
>
> (I know very well the problem. I have it since the start of KWord's AbiWord
> import filter. Without making it on purpose, the test file, which I had made,
> had exactly such a construction and of course gave problems with
> QDomDocument::setContent. That is why the AbiWord import filter is made with
> SAX (i.e. the QXML classes.))
I see.
I just tried this setContent variant, after hitting the <text:span> </text:span> parsing bug.
The problem is that it reports far too much whitespace. Even the newline between two
<text:p> elements is reported.
For instance:
<text:p>foo</text:p>
<text:p>bar</text:p>
leads to the firstChild()/nextSibling() loop reporting a tag with tagName == "".
Handling this would mean handling such 'empty tag' elements everywhere in the
code, or getting rid of all newlines and indentation in the generated XML files
(now I see why the OO files have none!).
In fact.... even though we could use <text:s> even for a single space
(which is not recommended) we would still not be able to read OO-generated
files (with <span> </span>). The OASIS spec says we should process whitespace
inside the text:p element and its children, and NOT in the rest of the XML,
but QDom/QXml doesn't allow us such fine-grained control, only a global on/off
setting.
So removing all our newlines and indentation, and using QXmlSimpleReader to get
all whitespace reported, sounds like the only way out.
--
David Faure, faure@kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
_______________________________________________
koffice-devel mailing list
koffice-devel@mail.kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic