'Re: encoding autodetection'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: encoding autodetection
From:       "Nick Shaforostoff" <shafff () ukr ! net>
Date:       2007-02-19 9:21:47
Message-ID: op.tnzoalfx63l8gs () localhost
[Download RAW message or body]

On Mon, 19 Feb 2007 03:20:43 +0200, Jacob R Rideout <kde@jacobrideout.net>  
wrote:

>> > so, i wanna add statistical encoding detection for cyr languages.
>>
>> As far as I got it, there is the new Sonnet engine to deal with this  
>> kind
>> of stuff in KDE4, worked on by Јаcob Rideout, in kdelibs/sonnet.
>>
>
> Sonnet is the right place to put encoding detection.
> The language detection module that currently only works with unicode  
> (language
> models are stored in utf8) since it has been optimized to work with
> QStrings. Modifying the current language detection to provide encoding
> detection shouldn't be too hard, since libtextcat uses the same idea.

my opinion is that encoding should be detected _before_ converting  
char[]to qstring
so we can avoid multiple reconversion.
(also, remember unicode 3-byte marks in the beginning of files)

when i have free time i'll look into code more deeply and maybe will  
consider
making new widget (kmenu) for enc selection to make it common around all  
apps
(by putting it into kdeui)

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread]