[prev in list] [next in list] [prev in thread] [next in thread]
List: koffice-devel
Subject: Re: Fwd: KWord File Format
From: kogs () kogsman ! demon ! co ! uk
Date: 2002-04-24 9:29:54
[Download RAW message or body]
I have solved the problem from my point of view by having two text properties for my
xml document object: Text and FormattedText which includes the TABS to produce a
pretty view, e.g.
Text property
<paragraph><text>yadda yadda yadda</text></paragraph>
FormattedText property
<paragraph>
<text>
yadda yadda yadda
</text>
</paragraph>
As I understand it, without an xml:space="preserve" attribute, a parser would
produce an identical tree representation of both of the examples above, i.e. the
DATA child of the text node would contain the string "yadda yadda yadda". With
xml:space="preserve", the second case would produce the string "\n yadda
yadda yadda\n ".
Also, without an xml:space="preserve", "yadda yadda yadda" should be the same
as "yadda yadda yadda". This would not be desirable in a word processor because it
is conventional to follow a "." with two spaces before starting the next sentence.
Perhaps the solution would be to leave out the xml:space attribute and define entity
references for the xml whitespace characters, " " (e.g. &kwsp), TAB (e.g. &kwtab),
\n (&kwnl) etc., and use these in the actual user document text. Thus, the
FormattedText example above would become:
FormattedText property
<paragraph>
<text>
yadda&kwsp;yadda&kwsp;yadda
</text>
</paragraph>
This would increase the size of documents somewhat. But hell, if one is using xml,
compact documents are not a prime concern :-).
Stuart
> ---- Original Message ----
> From: Nicolas Goutte
> Date: Tue 4/23/02 21:55
> To: koffice-devel@mail.kde.org
> Cc: kogs@kogsman.demon.co.uk
> Subject: Re: Fwd: KWord File Format
>
> No, please!
>
> That is one reason of the xml:space="preserve".
>
> If we want to do something, we should write the xml:space attribute.
>
> CDATA sections are bad, as you cannot write the sequence ]]>
>
> It is already bad that you cannot do it in the document information, so please
> not again in KWord.
>
> Have a nice day/evening/night!
>
> On Tuesday 23 April 2002 18:23, David Faure wrote:
> > This sounds like a valid point...
> > although it would certainly make the "grep '<text>'" idea more difficult
> > again. Does anyone know if this would break compatibility? I think reading
> > a text node is done the same way, with and without CDATA in the XML.
> >
> > ---------- Forwarded Message ----------
> >
> > Subject: KWord File Format
> > Date: 23/04/2002 19:30
> > From: kogs@kogsman.demon.co.uk
> > To: dfaure@kde.org
> >
> > I was playing around with KWord maindoc.xml files with a view to making an
> > external mail merge processor using a Kylix XML parser of my own devising.
> > My XML document object has a text property which returns the document as a
> > string containing the document with lots of TABS to make it pretty.
> >
> > When I loaded maindoc.xml into my parser and the wrote the text property
> > back out to maindoc.xml (and then tarred maindoc.xml and documentinfo.xml)
> > and opened it in KWord, the xml formating TABS were displayed.
> >
> > May I suggest, therefore, that the contents of <TEXT> elements of
> > maindoc.xml bracketed by <![CDATA[...]]> so that meaningful whitespace is
> > distinguished from xml formating whitespace which should not show up in
> > KWord. This would greatly assist external processing of KWord documents.
> >
> > Best regards
> >
> > Stuart
>
_______________________________________________
koffice-devel mailing list
koffice-devel@mail.kde.org
http://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic