[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-bugs-dist
Subject:    [Bug 123672] RTF - kword doesn't recognize lang/charset settings
From:       Nicolas Goutte <nicolasg () snafu ! de>
Date:       2006-03-17 11:59:58
Message-ID: 20060317115958.13164.qmail () ktown ! kde ! org
[Download RAW message or body]

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
         
http://bugs.kde.org/show_bug.cgi?id=123672         




------- Additional Comments From nicolasg snafu de  2006-03-17 12:59 -------
On Friday 17 March 2006 12:06, Mikolaj Machowski wrote:
(...)
>
> KWord shows MS-Word documents properly only because letters there are
> encoded by Unicode and special entities - it doesn't use native
> encoding.


KWord's RTF import filter uses native encoding when it is correctly defined:
\pc codepage 850 (approximation as it should be codepage 435)
\pca codepage 850
\mac Apple Roman encoding
\ansi codepage 1252

Then there is the \ansicpg keyword to set a codepage.

>
> Aaahhh.  Checked RTF 1.6 doc and it looks like by design RTF doesn't
> support native encodings?! 


It does, as it defines the keywords that I have listed above.

>They are supported only through Unicode
> entities.


Using Unicode (especially the \u keyword) is only an option, even if perhaps a 
recommended one, if backward compatibility is not needed.

> So you are right - document is invalid but that type of
> documents is really popular in Poland. I wonder about other countries
> with non-latin1 charsets/encodings...


Yes, I am starting to wonder too. 

It worries me that \pc and \ansi would perhaps not mean a particular codepage 
but just the locale MS-DOS respectively Windows codepages. If that is the 
case, then the RTF filter need quite an improvement.

(And a hack will probably not be enough, for documents using multiple kinds of 
fonts with differents \fcharset declarations.)

>
> Table of charsets from RTF 1.6 docs:
>
> \fcharsetN				*\fcharset*
> Specifies the character set of a font in the font table. Values for
> <i>N</i> are defined by Windows header files:


I suppose that it would be the best if such a tablewould be more central in 
KOffice, as at least other KWord filters would need it too, as they come from 
Windows too.

(...)

Have a nice day!
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic