[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xfree-i18n
Subject:    [I18n]Re: GNU libiconv 1.6.1 is released
From:       "Dmitry Yu. Bolkhovityanov" <D.Yu.Bolkhovityanov () inp ! nsk ! su>
Date:       2001-03-30 4:12:45
[Download RAW message or body]

On Wed, 28 Mar 2001 15:50:43 +0200, Pablo Saratxaga wrote:

[SNIP]
> Auto-detection is much harder, or even impossible, for 8bit encodings.
> For example autodetecting cyrillic between cp1251 and koi8-r would
> require language-specific analisys, to recognize often used words; as
> the cyrillic letters are in the same range on both encodings, but
> in different order.

    In fact, autodetecting cp1251/koi8-* is very simple: they occupy the
same range 0xC0-0xFF, but in cp1251 capitals are in 0xC0-0xDF subrange and
lowers are in 0xE0-0xFF, while in koi8-r the situation is opposite.

    So, the algorithm is: find a word in which the first letter is in
different case from others, and if that first letter is in 0xC0-0xFF, then
it is cp1251, otherwise koi8-*.

    This "feature" is wery well-known in Russia, so that if you use Pine
in koi8 environment and receive a letter in wrong encoding, just a glance is
enough to understand that it was sent by some illiterate Mocrosoft mailer
(usually Hotmail or OLE) and should be recoded 1251->koi8 (and in 95% cases
it is just a spam).


       ___________________________________________________________________
       Dmitry Yu. Bolkhovityanov  |  Novosibirsk, RUSSIA
       phone (383-2)-39-49-56     |  The Budker Institute of Nuclear Physics
                                  |  Lab. 5-13
_______________________________________________
I18n mailing list
I18n@XFree86.Org
http://XFree86.Org/mailman/listinfo/i18n

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic