[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice
Subject:    Re: KDE does not recognice KWord docs
From:       Nicolas Goutte <nicolasg () snafu ! de>
Date:       2003-09-29 14:19:11
[Download RAW message or body]

On Monday 29 September 2003 12:36, Holger Schroeder wrote:
> Hi all,
>
> On Saturday 27 September 2003 23:48, you wrote:
> > On Saturday 27 September 2003 20:25, Nicolas Goutte wrote:
> > > It is part of KDE. KZip 3.1 will simply generate "fat"-based data, KZip
> > > 3.2 will give "unx" ones.
> >
> > Thanks guys for this investigation. I wasn't aware of that change.
> > (Holger: it seems that this change broke the magic-recognition of KOffice
> > files, where the uncompressed file called "mimetype" would have its
> > contents at position 38 in the ZIP file).
>
> the structure of a zip file local header in a zip archive is:
>
> local file header signature 4 bytes (0x04034b50)
(...)
>
> (from http://www.pkware.com/products/enterprise/white_papers/appnote.html)
>
> as u understend from the patch contained in thomas zanders first mail in
> this thread, the mime type recognition is done by detecting the string
> application/x-kword in the file at a fixed offset.
>
> when you look at "old" kzip files as i coded them, there was no use for a
> extra field. so in the file there is the string
> mimetypeapplication/x-kword. so the filename and the beginning of the
> content are "concatenated" when there is no extra field. in this case (we
> know how long the filename is, and we know that there is no extra field)
> the beginning of the content/mimetype string is fixed.

Yes, that is the plain old fat format without extension. That is on what we 
based KOffice.

>
> the extra field as it is introduced now gives us the following advantages
> over the old way:
>
(...)
>
> so in the "not-koffice" case we should by default write the new fileformat,
> as these values are kind of useful there.
>
> so how to fix this for koffice ?
>
> i see two possibilities:
>
> 1.) add an option to allow writing of zip files without this extra info and
> use it in koffice, as the permissions and Xtimes are not needed in the
> files.

That is the one we want.

>
> 2.) as we only want to have "application/x-kword" at a fixed offset in the
> zip-archive, it would also be possible to not create a first file with the
> filename mimetype and the _content_ application/x-kword, but a first file
> with the _name_ mimetypeapplication/x-kword and any content after that.
> this would have the advantage, that our mimetype is _always_ at this
> offset, no matter which different extra fields with which lengths will be
> ever introduced, as the file name is saved in the zip file before the extra
> fields. as long as nobody creates a "zip format version 2", which will then
> be a whole new format, we would have solved this issue.

No! A file name application/ means an extra directory. Also vnd.kde.kword has 
two dots and is therefore an invalid FAT name. However the idea of the common 
packaging format was to be cross-filesystems.

As for ZIP "version 2", it exists. It is named ZIP64. (However it seems to fit 
more or less in the old format.)

>
> the only thing that should be checked is, how openoffice would handle these
> files. ok, i looked at an example file from openwriter. they have no first
> mimetype file in their format, they directly start with the file
> content.xml.

OO is still not using the common packaging format. (I have not checked in OO 
1.1 RC.)

>
> so i guess they neither care about a file named "mimetype" nor about a file
> called "mimetypeapplication/x-kword".

We do care, as we have agreed with OO's people about the new format.

>
> so somebody could change the code in koffice, that writes the mimetype to
> this and it should work. that would have the advantage, that we don't have
> to introduce a new function in kzip to not write the extra stuff, which
> would be a little bit ugly, if i understood it right with all these
> virtual_hooks. and we wouldn't have to always check that nobody breaks kzip
> in the future.

No we cannot. This is the last version of KOffice-own file format (as we 
switch to OO in KOffice 1.4.). It would be ridiculous to have such a change 
just for the last format.

>
> while we are at it, i would not only call the first file
> mimetypeapplication/x-kword, but i would suffix it with the version it was
> created with, perhaps we can use it for something in the future, and these
> few bytes do not hurt anybody. so it would be called for example
> application/x-kword-1.3.0 or so.

No, it is not what was agreed for. That is for the file format reader to 
detail. We have currently the syntaxversion attribute for that. And OO's 
formats allow extensions anyway.

>
> > > No, sure. I cannot remember if KOffice 1.2.x had its own KZip (named
> > > KoZip) or not. So I do not know if the change has to be done if KDE
> > > 3.1.x or in KOffice 1.2.x.
> >
> > CVS says that KoZip was part of KOffice-1.2.x indeed. But:
> > > But in any case, I am really starting to ask me if for the last
> > > KOffice-own file format it is useful to have again a subtle change.
> > > However this would mean to force KZip 3.2 to be able to write in the
> > > "fat" modus, either on command or simply for uncompressed files.
> >
> > Yes, I think we shouldn't do something that changes our 'magic'
> > recognition:
>
> * other projects/tools/etc. might use the magic we had

Yes, sure, that is why I wrote that we are stuck with it.

>
> > previously, this change will break it
>
> by using solution 2 it would be unbroken again
>
> * are we sure that the new offset is
>
> > always going to be 55? What's between position 30 and position 55? This
> > looks more fragile to me.
>
> the extra field can be of a variable length, and the file content starts
> directly after the filename and the extra field. so there is no general
> solution, when we want the detection string to be in the content and on the
> other hand allow this extra field. in the code of kzip.cpp there is already
> a possibility to parseInfoZipUnixNew and for sure it will somewhen
> introduce another length for the extra field...

Yes, that is why we need a plain old fat entry without any extension. That is 
what we agreed of with OO's people. (We had not thought that it would so 
hard. Sigh!)

>
> > * OpenOffice.org uses the "fat" format in ZIP files, and the whole point
> > of switching to ZIP was to use the same thing as they do, and
> > particularly having the same kind of magic mimetype recognition.
>
> i have no idea how they are doing mimetype detection. iirc their "weak"
> detection is solely based on the filename extension, and their "strong"
> detection they use when loading a file parses their manifest.xml, a kind of
> "table of contents" for their archive. but i may be wrong here...

The problem is not what OO does *now*. (It does not really need the manifest 
either.) The problem is what we agreed with OO's people.

>
> > So we need to fix KZip to give us "fat" format again. Holger: do you know
> > if that's easily doable? Is there any benefit in the "unx" stuff? Should
> > just move "back" to fat, or should we add a method to choose the format?
> >
> > (In case you missed the rest of the thread: zipinfo <file> shows "fat" or
> > "unx"; "fat" on OOo and kdelibs-3.1-generated files, and "unx" in
> > kdelibs-cvs-generated files)
>
> unfortunately i am quite busy with university, so i can't hack on that
> right now, but i will follow this discussion, so feel free to ask, if i
> explained something not good enough.

That probably means that I have to look at it. Well, I do not mind, if 
somebody helps me to check it on KDE 3.2. (I have only KDE 3.1.4.)

>
>
> Holger

Have a nice day!

>
> ____________________________________
> koffice mailing list
> koffice@mail.kde.org
> To unsubscribe please visit:
> http://mail.kde.org/mailman/listinfo/koffice

____________________________________
koffice mailing list
koffice@mail.kde.org
To unsubscribe please visit:
http://mail.kde.org/mailman/listinfo/koffice

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic