'RE: WinWord and *.doc'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    RE: WinWord and *.doc
From:       Nicolas Goutte <nicog () snafu ! de>
Date:       2001-02-27 13:37:32
[Download RAW message or body]

Okay, I think we have similar ideas about the subject, but we look at it differently. As you will program, you choose.

-----Original Message-----
From:	David Faure [SMTP:david@mandrakesoft.com]
Sent:	Monday, February 26, 2001 3:33 PM
To:	koffice-devel@max.tat.physik.uni-tuebingen.de
Subject:	Re: WinWord and *.doc

On Monday 26 February 2001 14:06, Nicolas Goutte wrote:
> Your idea with "UNTAR" is good but be careful. If "UNTAR" has to 
> (ungzip and to) untar each time it appears, then you have also a big 
> performance problem. 

No, that's the point in using KFilterBase. It "ungzips" on demand, so
generally, checking for the mimetype of a gzipped file would be very fast.

BUT there is a problem with tar files, though. A given file can be
at any place inside the tar... and KTar parses the whole file to know what's
inside. We could make it stop when it finds the file we want (maindoc.xml),
but that doesn't solve the problem that in theory maindoc.xml could
be at the end of the file (doesn't happen when it's KoStore that packs up
the file though).

> And if you have something like an UNTAR section, 
> then you are not far away from my idea. (Having the modules inside 
> KMimeMagic instead of outside!)

Yes, that's definitely what we need, but note that the point would be
* to have "native" gzip support in KMimeMagic - we need that
* to have a module that handles UNTAR.

> And if you do a "UNTAR", do you think we could also have a "XML"? And 
> perhaps also "MS-OLE" (for MS Office) would be need too (this one will 
> surely be tricky). May be other such "modifiers" are needed.

Good idea... Maybe it would help with the following.
We would not want to parse the whole XML though, so the Qt XML classes
don't apply.

> As for your example (corrected),
> UNTAR maindoc.xml 40 <DOC\ editor=\"KWord\" 
> mime=\"application/x-kword\"
> I personnaly see many problems:
> - you are not checking the <!DOCTYPE
Because there is none in the KWord file I just opened.
<?xml version="1.0" encoding="UTF-8"?>
<DOC editor="KWord" mime="application/x-kword" syntaxVersion="1">

> - you force <DOC to be at the 40th position (incompatible with XML)
Right, but we know that KOffice always starts files with
<?xml version="1.0" encoding="UTF-8"?>
This can of course be improved, but I'm looking for a solution that works fast,
not for 4 months of development :)
Maybe something like "look for this in the first 100 bytes" would help though.

> - the attribute "editor" is defined in the DTD as implied, so you 
> cannot check for it!
But we know that KWord writes it. I'm trying to be pragmatic here...

> And KWord is not that complicate, as you (and the other KOffice 
> programmers) can control the file format. But for example think about 
> DocBook (in its original SGML format or in its simplified XML format.) 
> Here you do not control the file formats and there are many 
> applications that can write them (even with private extensions). 

Yes, it's always easy to see problems. Let's find solutions, rather :)
Whoever wants to add support for any XML or SGML file will add
the necessary bits to kmimemagic - another reason for making this
modular _and_ based on a description in a file if possible (much easier
to extend, at a much lower cost than new code each time).

> The problem is that our future KMimeMagic must still somehow be able to do 
> something (not necessarely in relation with KOffice).
See why I don't want this to be koffice specific :)

> That is why I thought that it would be good if we could have something 
> like filters to determine the exact mime type. Obviously, for 
> performance reason, it must be somehow done in KMimeMagic itself. (But 
> perhaps with some modularity!)

Agreed.

> NOTE:
> To be clear: I am neither against using KMimeMagic nor against its 
> extension. I would just like that an important thing like breaking the 
> magic file format has not to be done again in (let's say) one year, 
> because in the meantime problems have appeared. I am trying here to 
> show some of the problems that you will surely get.

Hmm, my initial idea was a separate file, in order to not "break the magic file format".
But in fact this is already NOT the same format as the one for the "file" command,
since we have mimetype names instead of vague descriptions. So "breaking"
the format doesn't break much, it's only about extending it :)

> I am also not against to extending KMimeMagic step by step. We do not 
> need to have a perfect KMimeMagic at first try.
Definitely.

-- 
David FAURE, david@mandrakesoft.com, faure@kde.org
http://perso.mandrakesoft.com/~david/, http://www.konqueror.org/
KDE, Making The Future of Computing Available Today
_______________________________________________
Koffice-devel mailing list
Koffice-devel@master.kde.org
http://master.kde.org/mailman/listinfo/koffice-devel

_______________________________________________
Koffice-devel mailing list
Koffice-devel@master.kde.org
http://master.kde.org/mailman/listinfo/koffice-devel

[prev in list] [next in list] [prev in thread] [next in thread]