[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: Minimising XML output in KWord import filters
From:       Nicolas Goutte <nicog () snafu ! de>
Date:       2002-02-16 20:10:36
[Download RAW message or body]

On Saturday 16 February 2002 11:50, Clarence Dang wrote:
> Hi,
>
> I'm trying to speed up the mswrite filter by cutting down on the XML
> output. Unfortunately, I'm not quite sure what XML I can and can't leave
> out. So, I have a few questions (someone please answer them!):
>
> 1. From the DTD, "Some special characters ('<', '>', '&') are "escaped"
> ('&lt;', '&gt;', '&amp;')":
>
> a) The DTD, does not explicitly define what special characters are escaped.
> IMHO, this is a fault in the DTD and should be fixed (I can only smile if
> asked for a patch :) because it's been a long time since I've written a
> DTD).

XML defines that &amp; &gt; &lt; &quot; &apos; are defined.

>
> b) Related to a): Do I have to do &apos; and &quot?
> What about &copy; and other HTML (not XML) entities?

&lt; and &amp; are mandatory.

&gt; is not mandatory apart in the sequence ]]> (needs to be ]]&gt; )

&quot; is necessary inside double quoted attributes.

&amp; is necessary inside single quoted attributes.

Other entities are not allowed (apart character references like for example: 
&#32; or &#x20; ) One reason is that QDomDocument::setContent cannot process 
them.

>
> 2. Can I leave out "#IMPLIED" attributes?
>
> Can I assume that the defaults for "#IMPLIED" attributes won't change any
> time soon?

No, you cannot assume anything (unfortunately.)

It was part of the problem of the export filters last year. The writers of 
KWord's saving assumed something, the writers of filters assumed 
something else!

I do not see any guarantee that it will not happen in future again.

>
> 3. Can I leave out "#REQUIRED" attributes?
>
> I have noticed that even KWord produces XML that leaves out so-called
> "#REQUIRED" attributes (the best example is the PAGEBREAKING tag).

Same comment as for #IMPLIED.

>
> 3. Can I leave out tags that the document doesn't use -- and depend on
> KWord defaults?
>
> E.g. suppose that the entire document had normal linespacing, can I leave
> out "<LINESPACING />" in "<LAYOUT>" and hope that KWord does normal
> linespacing (even if I don't define a style with normal linespacing)?

In <LAYOUT> it is the style that will give the default. If your style does 
not define <LINESPACING>, you could find bugs in future KWord versions (like 
in the past for files that never defined <SIZE>.)

By the way, <STYLE> is really an element where you should not leave out 
anything.

>

In general, I would be careful with KWord's DTD. It does not define 
everything and it is sometimes plain wrong. (for example: KWord uses 
<VARIABLE>'s children directly as children of <FORMAT> and does not use the 
<VARIABLE> element, unlike described in the DTD.)

If you are ready to take time to find and fix KWord's bugs in future, you can 
make your own format, however the more you go away from formats that KWord 
has saved today or in the past, the more problems you will get.

> Thanks!
> Clarence

Have a nice day/evening/night!
_______________________________________________
koffice-devel mailing list
koffice-devel@mail.kde.org
http://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic