[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: Problems with User-defined variables in KWord
From:       Gary Cramblitt <garycramblitt () comcast ! net>
Date:       2006-03-17 3:29:47
Message-ID: 200603162229.47576.garycramblitt () comcast ! net
[Download RAW message or body]

On Thursday 16 March 2006 18:40, Jaroslaw Staniek wrote:> Sebastian Sauer said the \
following, On 2006-03-16 23:55:> > Seems to be a more generic problem. E.g. at> >> >  \
<text:p text:style-name="P2" >> >         Line1> >         Line2> >    </text:p>> >> \
> KWord introduces a linebreak between Line1 and Line2 while oo.org adds a> > \
> whitespace between them.>> Bah! Thanks for this note.>> FYI: I am using \
> QDomDocument::toString() to produce content.xml so unlike> oo.org's XML, my XML is \
> indented and contains \n. But KWord does the same> as me...>> 1. OTOH In this \
> example, two variables>> [Name] [Surname]>> using text:user-field-get tag _should_ \
> be rendered as a single> line/paragraph but KWord magically creates two lines ...>> \
> [Name]> [Surname]>> ...for this XML:>     <text:p text:style-name="P2" >>      \
> <text:user-field-get text:name="name2" >Name</text:user-field-get>>      \
> <text:user-field-get text:name="surname2"> >Surname</text:user-field-get> \
> </text:p>>>> 2. Notes:> As there is one space inserted between [Name] and [Surname] \
> in the> document, second text:user-field-get tag comes after at least one> \
> whitespace. For oo.org-generated XML, the whitespace is jsut one space. For> KWord \
> it's \n plus full indentation (4 spaces in our case).>> I can see Sebastion already \
> provided a patch eating a \n.>> I also tested how the oo.org's XML looks like when \
> two spaces are inserted> in between (i.e. [Name] [Surname]) and <text:s/> appears \
> to be used once,> as the secons space is a real space:>> <text:user-field-get \
> text:name="name2" >Name> </text:user-field-get> <text:s/><text:user-field-get \
> text:name= [..]>                        ^>>                        |notice this \
> space + <text:s/>>> OK. David, could you tell me, is this a valid behaviour?> For \
> me, <text:s/> should be used. See 3. for explanation.>> 3. Now after considering \
> the example with two non-breaking spaces,>    I can see a PROBLEM... (please \
> correct me):>> QDomDocument::toString() will indent and waste information about \
> number of> spaces used between tags (in general, but in our case between \
> variables)>> It will just indent as follows:>>    <text:user-field-get \
> text:name="name2" >Name</text:user-field-get>>    <text:s/>>    \
> <text:user-field-get text:name= [..]>> So we have a whitespace before and after \
> <text:s/>, while in the example> 2., for oo.org's XML, we have just one space plus \
> <text:s/>.>> There's probably a workaround for this and that's my main question \
> now.
The relevant part of the oo.org spec is here:
http://books.evc-cit.info/odbook/ch03.html#whitespace-section
From what you say above, it appears that QDomDocument::toString() is not following \
the oo.org formatting rules with respect to whitespace; hence the problem. There are \
actually two issues here:  1.  How to handle whitespace in mixed content, and  2.  \
How to handle whitespace in element content. In general, whitespace handling is not \
well-defined in mixed content when the XML parser does not have a DTD or schema.  For \
instance, is the space in the following meaningful? <a> <b>sometext</b></a>
If a DTD is present and the <a> tag is not allowed to have mixed content, then the \
parser knows the space between <a> and <b> is "not meaningful" and can ignore it \
(although it is still required to inform the application about it).  But in the \
absence of a DTD or schema, the parser cannot know this.  In this case, according to \
_The XML Companion_, the whitespace is not defined and subject to "application \
interpretation". wrt to multiple whitespace in element content, the oo.org spec \
referenced above says: In XML, whitespace in element content is typically not \
preserved unless specially designated. OpenDocument collapses consecutive whitespace \
characters, which are defined as space (0x0020), tab (0x0009), carriage return \
(0x000D), and line feed (0x000A) to a single space. How, then, does OpenDocument \
represent a document where whitespace is significant?  To handle extra spaces, \
OpenDocument uses the <text:s> element. This empty element has an optional attribute, \
text:c, which tells how many spaces occur in the document. If this attribute is \
absent, then the element inserts one space. Between words, the <text:s> element is \
used to describe spaces after the first one; thus, for a single space, you don't need \
this element.  At the beginning of a line, you do need the <text:s>, since \
OpenDocument eliminates leading whitespace immediately after a starting tag. \
According to the Annotated XML here http://www.xml.com/axml/testaxml.htm
QDomDocument::toString() is behaving correctly, in that it is passing all the white \
space to the application.  However, the oo.org spec is saying that sequences of more \
than one whitespace character should be collapsed into a single space.  It also says \
that leading whitespace immediately after a starting element tag must be eliminated. \
Note that even if you had a parser that supported a setting for collapsing whitespace \
(many do), you'd still have the problem of eliminating the leading spaces immediately \
after a starting tag. Given these conflicting behaviors, two possible solutions come \
to mind: 1.  Recursively walk the QDom tree and eliminate leading spaces in the \
content of each textual node.  Then you can follow your calls to \
QDomDocument::toString() with  simplyWhiteSpace() to collapse consecutive spaces to a \
single space.  Notice that just using simplifyWhiteSpace alone won't eliminate the \
leading spaces immediately after an element tag if the element contains subelements. \
2.  Use a stylesheet to preprocess the document as suggested in the oo.org spec I \
linked to.

-- Gary Cramblitt (aka \
PhantomsDad)_______________________________________________koffice-devel mailing \
listkoffice-devel@kde.orghttps://mail.kde.org/mailman/listinfo/koffice-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic