'Re: Fwd: KWord File Format'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: Fwd: KWord File Format
From:       kogs () kogsman ! demon ! co ! uk
Date:       2002-04-24 9:29:54
[Download RAW message or body]

I have solved the problem from my point of view by having two text properties for my 
xml document object: Text and FormattedText which includes the TABS to produce a 
pretty view, e.g.

Text property
<paragraph><text>yadda yadda yadda</text></paragraph>

FormattedText property
<paragraph>
    <text>
        yadda yadda yadda
    </text>
</paragraph>

As I understand it, without an xml:space="preserve" attribute, a parser would 
produce an identical tree representation of both of the examples above, i.e. the 
DATA child of the text node would contain the string "yadda yadda yadda".  With 
xml:space="preserve", the second case would produce the string "\n        yadda 
yadda yadda\n    ".

Also, without an xml:space="preserve", "yadda yadda     yadda" should be the same 
as "yadda yadda yadda".  This would not be desirable in a word processor because it 
is conventional to follow a "." with two spaces before starting the next sentence.

Perhaps the solution would be to leave out the xml:space attribute and define entity 
references for the xml whitespace characters, " " (e.g. &kwsp), TAB (e.g. &kwtab), 
\n (&kwnl) etc., and use these in the actual user document text.  Thus, the 
FormattedText example above would become:

FormattedText property
<paragraph>
    <text>
        yadda&kwsp;yadda&kwsp;yadda
    </text>
</paragraph>

This would increase the size of documents somewhat.  But hell, if one is using xml, 
compact documents are not a prime concern :-).

Stuart

> ---- Original Message ----
> From:		Nicolas Goutte
> Date:		Tue 4/23/02 21:55
> To:		koffice-devel@mail.kde.org
> Cc:		kogs@kogsman.demon.co.uk
> Subject:	Re: Fwd: KWord File Format
> 
> No, please!
> 
> That is one reason of the xml:space="preserve".
> 
> If we want to do something, we should write the xml:space attribute.
> 
> CDATA sections are bad, as you cannot write the sequence ]]> 
> 
> It is already bad that you cannot do it in the document information, so please 
> not again in KWord.
> 
> Have a nice day/evening/night!
> 
> On Tuesday 23 April 2002 18:23, David Faure wrote:
> > This sounds like a valid point...
> > although it would certainly make the "grep '<text>'" idea more difficult
> > again. Does anyone know if this would break compatibility? I think reading
> > a text node is done the same way, with and without CDATA in the XML.
> >
> > ----------  Forwarded Message  ----------
> >
> > Subject: KWord File Format
> > Date: 23/04/2002 19:30
> > From: kogs@kogsman.demon.co.uk
> > To: dfaure@kde.org
> >
> > I was playing around with KWord maindoc.xml files with a view to making an
> > external mail merge processor using a Kylix XML parser of my own devising. 
> > My XML document object has a text property which returns the document as a
> > string containing the document with lots of TABS to make it pretty.
> >
> > When I loaded maindoc.xml into my parser and the wrote the text property
> > back out to maindoc.xml (and then tarred maindoc.xml and documentinfo.xml)
> > and opened it in KWord, the xml formating TABS were displayed.
> >
> > May I suggest, therefore, that the contents of <TEXT> elements of
> > maindoc.xml bracketed by <![CDATA[...]]> so that meaningful whitespace is
> > distinguished from xml formating whitespace which should not show up in
> > KWord.  This would greatly assist external processing of KWord documents.
> >
> > Best regards
> >
> > Stuart
> 
_______________________________________________
koffice-devel mailing list
koffice-devel@mail.kde.org
http://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]