[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: encoding
From:       Tomasz Grobelny <grotk () poczta ! onet ! pl>
Date:       2004-08-11 22:45:47
Message-ID: 200408120045.48358.grotk () poczta ! onet ! pl
[Download RAW message or body]

On Wednesday 11 of August 2004 00:42, Thiago Macieira wrote:
> Tomasz Grobelny wrote:
> >On Tuesday 10 of August 2004 23:03, Brad Hards wrote:
> >> On Wed, 11 Aug 2004 06:50 am, Tomasz Grobelny wrote:
> >> > Is there a function somewhere in KDE classes that would show
> >> > encoding of a text file (or at least some aproximation)?
> >>
> >> KCharsets?
> >
> >I don't see anything that would show encoding based on file content...
>
> There's no public function that you can readily use. However, you may
> find the code you want in kdelibs/khtml/misc/decoder.cpp.
>
The code there doesn't seem to be good enough (it failed on some testfiles). 
However I found something called "N-Gram-Based Text Categorization" with 
GPLed implementation. Basically this algorithm guessed all european languages 
I tried and choosing between two or three encodings for a known language 
isn't a big problem I think (the current code does that). You can try TextCat 
at http://odur.let.rug.nl/~vannoord/TextCat/Demo/ Could anybody try other 
languages (Chinese, Japanese, Thai and the like) and post the results (since 
I can't recognise them)?

> If you make a public function/class out of that, it might be interesting
> to share. In special, if you write it, make it so that you can replace
> the code in decoder.cpp with a simple call to your function/class.
_If_ I write something useful I'll let you know.

Tomek
 
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic