From kde-bugs-dist Fri Mar 17 11:06:28 2006 From: Mikolaj Machowski Date: Fri, 17 Mar 2006 11:06:28 +0000 To: kde-bugs-dist Subject: [Bug 123672] RTF - kword doesn't recognize lang/charset settings Message-Id: <20060317110628.10194.qmail () ktown ! kde ! org> X-MARC-Message: https://marc.info/?l=kde-bugs-dist&m=114259372301734 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. http://bugs.kde.org/show_bug.cgi?id=123672 ------- Additional Comments From mikmach wp pl 2006-03-17 12:06 ------- Confirming, it doesn't work (waited for recompilation of KOffice) In RTF charset isn't simply related to codepage:: +void RTFImport::setCharset( RTFProperty *property ) +{ + if(token.value >= 0) + setCodepage(property); +} + For example \fcharset238 is codepage cp-1250 KWord shows MS-Word documents properly only because letters there are encoded by Unicode and special entities - it doesn't use native encoding. Aaahhh. Checked RTF 1.6 doc and it looks like by design RTF doesn't support native encodings?! They are supported only through Unicode entities. So you are right - document is invalid but that type of documents is really popular in Poland. I wonder about other countries with non-latin1 charsets/encodings... Table of charsets from RTF 1.6 docs: \fcharsetN *\fcharset* Specifies the character set of a font in the font table. Values for N are defined by Windows header files: 0 -- ANSI <- cp-1252 (MM) 1 -- Default 2 -- Symbol 3 -- Invalid 77 -- Mac 128 -- Shift Jis 129 -- Hangul <- cp-949 (?) 130 -- Johab <- cp-1361 134 -- GB2312 <- I always mix these two: cp-936 and cp-950 136 -- Big5 <- as above 161 -- Greek <- cp-1253 162 -- Turkish <- cp-1254 163 -- Vietnamese <- cp-1258 177 -- Hebrew <- cp-1255 178 -- Arabic 179 -- Arabic Traditional 180 -- Arabic user 181 -- Hebrew user 186 -- Baltic <- cp-1257 204 -- Russian <- cp-1251 (? - not sure, there are several cyrillics) 222 -- Thai <- cp-874 238 -- Eastern European <- cp-1250 254 -- PC 437 255 -- OEM