[prev in list] [next in list] [prev in thread] [next in thread]
List: koffice-devel
Subject: Re: Problems with User-defined variables in KWord
From: Gary Cramblitt <garycramblitt () comcast ! net>
Date: 2006-03-17 3:29:47
Message-ID: 200603162229.47576.garycramblitt () comcast ! net
[Download RAW message or body]
On Thursday 16 March 2006 18:40, Jaroslaw Staniek wrote:> Sebastian Sauer said the \
following, On 2006-03-16 23:55:> > Seems to be a more generic problem. E.g. at> >> > \
<text:p text:style-name="P2" >> > Line1> > Line2> > </text:p>> >> \
> KWord introduces a linebreak between Line1 and Line2 while oo.org adds a> > \
> whitespace between them.>> Bah! Thanks for this note.>> FYI: I am using \
> QDomDocument::toString() to produce content.xml so unlike> oo.org's XML, my XML is \
> indented and contains \n. But KWord does the same> as me...>> 1. OTOH In this \
> example, two variables>> [Name] [Surname]>> using text:user-field-get tag _should_ \
> be rendered as a single> line/paragraph but KWord magically creates two lines ...>> \
> [Name]> [Surname]>> ...for this XML:> <text:p text:style-name="P2" >> \
> <text:user-field-get text:name="name2" >Name</text:user-field-get>> \
> <text:user-field-get text:name="surname2"> >Surname</text:user-field-get> \
> </text:p>>>> 2. Notes:> As there is one space inserted between [Name] and [Surname] \
> in the> document, second text:user-field-get tag comes after at least one> \
> whitespace. For oo.org-generated XML, the whitespace is jsut one space. For> KWord \
> it's \n plus full indentation (4 spaces in our case).>> I can see Sebastion already \
> provided a patch eating a \n.>> I also tested how the oo.org's XML looks like when \
> two spaces are inserted> in between (i.e. [Name] [Surname]) and <text:s/> appears \
> to be used once,> as the secons space is a real space:>> <text:user-field-get \
> text:name="name2" >Name> </text:user-field-get> <text:s/><text:user-field-get \
> text:name= [..]> ^>> |notice this \
> space + <text:s/>>> OK. David, could you tell me, is this a valid behaviour?> For \
> me, <text:s/> should be used. See 3. for explanation.>> 3. Now after considering \
> the example with two non-breaking spaces,> I can see a PROBLEM... (please \
> correct me):>> QDomDocument::toString() will indent and waste information about \
> number of> spaces used between tags (in general, but in our case between \
> variables)>> It will just indent as follows:>> <text:user-field-get \
> text:name="name2" >Name</text:user-field-get>> <text:s/>> \
> <text:user-field-get text:name= [..]>> So we have a whitespace before and after \
> <text:s/>, while in the example> 2., for oo.org's XML, we have just one space plus \
> <text:s/>.>> There's probably a workaround for this and that's my main question \
> now.
The relevant part of the oo.org spec is here:
http://books.evc-cit.info/odbook/ch03.html#whitespace-section
From what you say above, it appears that QDomDocument::toString() is not following \
the oo.org formatting rules with respect to whitespace; hence the problem. There are \
actually two issues here: 1. How to handle whitespace in mixed content, and 2. \
How to handle whitespace in element content. In general, whitespace handling is not \
well-defined in mixed content when the XML parser does not have a DTD or schema. For \
instance, is the space in the following meaningful? <a> <b>sometext</b></a>
If a DTD is present and the <a> tag is not allowed to have mixed content, then the \
parser knows the space between <a> and <b> is "not meaningful" and can ignore it \
(although it is still required to inform the application about it). But in the \
absence of a DTD or schema, the parser cannot know this. In this case, according to \
_The XML Companion_, the whitespace is not defined and subject to "application \
interpretation". wrt to multiple whitespace in element content, the oo.org spec \
referenced above says: In XML, whitespace in element content is typically not \
preserved unless specially designated. OpenDocument collapses consecutive whitespace \
characters, which are defined as space (0x0020), tab (0x0009), carriage return \
(0x000D), and line feed (0x000A) to a single space. How, then, does OpenDocument \
represent a document where whitespace is significant? To handle extra spaces, \
OpenDocument uses the <text:s> element. This empty element has an optional attribute, \
text:c, which tells how many spaces occur in the document. If this attribute is \
absent, then the element inserts one space. Between words, the <text:s> element is \
used to describe spaces after the first one; thus, for a single space, you don't need \
this element. At the beginning of a line, you do need the <text:s>, since \
OpenDocument eliminates leading whitespace immediately after a starting tag. \
According to the Annotated XML here http://www.xml.com/axml/testaxml.htm
QDomDocument::toString() is behaving correctly, in that it is passing all the white \
space to the application. However, the oo.org spec is saying that sequences of more \
than one whitespace character should be collapsed into a single space. It also says \
that leading whitespace immediately after a starting element tag must be eliminated. \
Note that even if you had a parser that supported a setting for collapsing whitespace \
(many do), you'd still have the problem of eliminating the leading spaces immediately \
after a starting tag. Given these conflicting behaviors, two possible solutions come \
to mind: 1. Recursively walk the QDom tree and eliminate leading spaces in the \
content of each textual node. Then you can follow your calls to \
QDomDocument::toString() with simplyWhiteSpace() to collapse consecutive spaces to a \
single space. Notice that just using simplifyWhiteSpace alone won't eliminate the \
leading spaces immediately after an element tag if the element contains subelements. \
2. Use a stylesheet to preprocess the document as suggested in the oo.org spec I \
linked to.
-- Gary Cramblitt (aka \
PhantomsDad)_______________________________________________koffice-devel mailing \
listkoffice-devel@kde.orghttps://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic