From kde-devel  Mon May 23 16:34:20 2005
From: Tomasz Grobelny <grotk () poczta ! onet ! pl>
Date: Mon, 23 May 2005 16:34:20 +0000
To: kde-devel
Subject: Re: [PATCH] improvement KDE's auto-detection text encode feature.
Message-Id: <200505231834.20946.grotk () poczta ! onet ! pl>
X-MARC-Message: https://marc.info/?l=kde-devel&m=111687262004524

On Monday 23 of May 2005 14:55, Takumi ASAKI wrote:
> Hi, all.
>
> I create some patches to improve KDE's auto-detection text encode.
> Please review and comment it.
>
From what you've written it seems that each language needs separate plugin. I 
wonder if it wouldn't be easier to employ some general kind of algorithm and 
feed it with langage specific data (which would certainly be easier for 50+ 
languages KDE supports). Specifically I mean n-gram based text 
categorisation. It is primarily meant for language guessing but IMO would 
also work quite well for encoding guessing. In the past I made a few tests 
using textcat for "Polish" encodings ISO 8859-2 and CP 1250 and (as far as I 
can remember) it proved to be useful.

Tomek
 
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<