'Re: [PATCH] improvement KDE's auto-detection text encode feature.'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: [PATCH] improvement KDE's auto-detection text encode feature.
From:       Tomasz Grobelny <grotk () poczta ! onet ! pl>
Date:       2005-05-23 16:34:20
Message-ID: 200505231834.20946.grotk () poczta ! onet ! pl
[Download RAW message or body]

On Monday 23 of May 2005 14:55, Takumi ASAKI wrote:
> Hi, all.
>
> I create some patches to improve KDE's auto-detection text encode.
> Please review and comment it.
>
From what you've written it seems that each language needs separate plugin. I 
wonder if it wouldn't be easier to employ some general kind of algorithm and 
feed it with langage specific data (which would certainly be easier for 50+ 
languages KDE supports). Specifically I mean n-gram based text 
categorisation. It is primarily meant for language guessing but IMO would 
also work quite well for encoding guessing. In the past I made a few tests 
using textcat for "Polish" encodings ISO 8859-2 and CP 1250 and (as far as I 
can remember) it proved to be useful.

Tomek

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread]