From kde-bugs-dist  Fri Mar 17 11:06:28 2006
From: Mikolaj Machowski <mikmach () wp ! pl>
Date: Fri, 17 Mar 2006 11:06:28 +0000
To: kde-bugs-dist
Subject: [Bug 123672] RTF - kword doesn't recognize lang/charset settings
Message-Id: <20060317110628.10194.qmail () ktown ! kde ! org>
X-MARC-Message: https://marc.info/?l=kde-bugs-dist&m=114259372301734

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
         
http://bugs.kde.org/show_bug.cgi?id=123672         




------- Additional Comments From mikmach wp pl  2006-03-17 12:06 -------
Confirming, it doesn't work (waited for recompilation of KOffice)

In RTF charset isn't simply related to codepage::

    +void RTFImport::setCharset( RTFProperty *property )
    +{
    +    if(token.value >= 0)
    +        setCodepage(property);
    +}
    +

For example \fcharset238 is codepage cp-1250

KWord shows MS-Word documents properly only because letters there are
encoded by Unicode and special entities - it doesn't use native
encoding.

Aaahhh.  Checked RTF 1.6 doc and it looks like by design RTF doesn't
support native encodings?! They are supported only through Unicode
entities. So you are right - document is invalid but that type of
documents is really popular in Poland. I wonder about other countries
with non-latin1 charsets/encodings...

Table of charsets from RTF 1.6 docs:

\fcharsetN				*\fcharset*
Specifies the character set of a font in the font table. Values for
<i>N</i> are defined by Windows header files:

0 -- ANSI       <- cp-1252 (MM)
1 -- Default
2 -- Symbol
3 -- Invalid
77 -- Mac
128 -- Shift Jis
129 -- Hangul <- cp-949 (?)
130 -- Johab <- cp-1361
134 -- GB2312 <- I always mix these two: cp-936 and cp-950
136 -- Big5 <- as above
161 -- Greek  <- cp-1253
162 -- Turkish <- cp-1254
163 -- Vietnamese <- cp-1258
177 -- Hebrew <- cp-1255
178 -- Arabic
179 -- Arabic Traditional
180 -- Arabic user
181 -- Hebrew user
186 -- Baltic <- cp-1257
204 -- Russian <- cp-1251 (? - not sure, there are several cyrillics)
222 -- Thai <- cp-874
238 -- Eastern European  <- cp-1250
254 -- PC 437
255 -- OEM