'[Bug 123672] RTF - kword doesn't recognize lang/charset settings'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice
Subject:    [Bug 123672] RTF - kword doesn't recognize lang/charset settings
From:       Nicolas Goutte <nicolasg () snafu ! de>
Date:       2006-03-17 16:06:46
Message-ID: 20060317160646.21546.qmail () ktown ! kde ! org
[Download RAW message or body]

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

http://bugs.kde.org/show_bug.cgi?id=123672         

------- Additional Comments From nicolasg snafu de  2006-03-17 17:06 -------
On Friday 17 March 2006 16:49, Mikolaj Machowski wrote:
(...)
> > Then there is the \ansicpg keyword to set a codepage.
>
> After adding just \ansicpg1250 immediately after \ansi KWord displays
> previously attached document as it was indented (all Polish characters
> visible).

That is good. (The document could be even more "wrong".)

>
> > > Aaahhh.  Checked RTF 1.6 doc and it looks like by design RTF doesn't
> > > support native encodings?!
> >
> > It does, as it defines the keywords that I have listed above.
>
> Sorry, misunderstood, pre-1.6 versions officially didn't support them.

It depends. \pc \pca \mac and \ansi are already existing since WinWord 1.x (so 
probably RTF 1.2).

Only \ansicpg is relatively recent.

> Even now \ansicpg is rather for proper translation of UTF than native
> encodings.

Why? The \u keyword does not need to know the encoding of the file.

> 8-bit characters are only as a side effect: 

On contrary, I think that it is the primary goal.

> (Converters that
> communicate with Microsoft Word for Windows or Microsoft Word for the
> Macintosh should expect 8-bit characters.)

The problem is that basically RTF is a 7 bit file format, as at the time RTF 
1.0 was defined major U.S. networks were not 8 bit clean.

Until RTF 1,2, it was made a little less U.S but you had to encode the 
characters with \' if they were not 7 bit clean.

Nowadays it should be 8 bit clean.

>
> > It worries me that \pc and \ansi would perhaps not mean a particular
> > codepage but just the locale MS-DOS respectively Windows codepages. If
> > that is the case, then the RTF filter need quite an improvement.
>
> I am afraid this is the case. Also possible is that MS-programs are
> just guessing encoding depending on locale
> or perform additional tests
> to display properly.

> You could check OO.o code - oowriter displays
> document without problems.

It is rather difficult to read OOo's code.

Have a nice day!
____________________________________
koffice mailing list
koffice@kde.org
To unsubscribe please visit:
https://mail.kde.org/mailman/listinfo/koffice
[prev in list] [next in list] [prev in thread] [next in thread]