[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: encoding
From:       Cougar <liuspider () yahoo ! com>
Date:       2004-08-12 2:12:47
Message-ID: 20040812021247.13052.qmail () web50505 ! mail ! yahoo ! com
[Download RAW message or body]

It seems to me that it can only recongize gb2312
encoded chinese, even when you feed it with big5
chinese characters the result is gb2312

--- Tomasz Grobelny <grotk@poczta.onet.pl> wrote:

> On Wednesday 11 of August 2004 00:42, Thiago
> Macieira wrote:
> > Tomasz Grobelny wrote:
> > >On Tuesday 10 of August 2004 23:03, Brad Hards
> wrote:
> > >> On Wed, 11 Aug 2004 06:50 am, Tomasz Grobelny
> wrote:
> > >> > Is there a function somewhere in KDE classes
> that would show
> > >> > encoding of a text file (or at least some
> aproximation)?
> > >>
> > >> KCharsets?
> > >
> > >I don't see anything that would show encoding
> based on file content...
> >
> > There's no public function that you can readily
> use. However, you may
> > find the code you want in
> kdelibs/khtml/misc/decoder.cpp.
> >
> The code there doesn't seem to be good enough (it
> failed on some testfiles). 
> However I found something called "N-Gram-Based Text
> Categorization" with 
> GPLed implementation. Basically this algorithm
> guessed all european languages 
> I tried and choosing between two or three encodings
> for a known language 
> isn't a big problem I think (the current code does
> that). You can try TextCat 
> at http://odur.let.rug.nl/~vannoord/TextCat/Demo/
> Could anybody try other 
> languages (Chinese, Japanese, Thai and the like) and
> post the results (since 
> I can't recognise them)?
> 
> > If you make a public function/class out of that,
> it might be interesting
> > to share. In special, if you write it, make it so
> that you can replace
> > the code in decoder.cpp with a simple call to your
> function/class.
> _If_ I write something useful I'll let you know.
> 
> Tomek
>  
> >> Visit
> http://mail.kde.org/mailman/listinfo/kde-devel#unsub
> to unsubscribe <<
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
 
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic