From kde-devel Mon May 23 16:34:20 2005 From: Tomasz Grobelny Date: Mon, 23 May 2005 16:34:20 +0000 To: kde-devel Subject: Re: [PATCH] improvement KDE's auto-detection text encode feature. Message-Id: <200505231834.20946.grotk () poczta ! onet ! pl> X-MARC-Message: https://marc.info/?l=kde-devel&m=111687262004524 On Monday 23 of May 2005 14:55, Takumi ASAKI wrote: > Hi, all. > > I create some patches to improve KDE's auto-detection text encode. > Please review and comment it. > From what you've written it seems that each language needs separate plugin. I wonder if it wouldn't be easier to employ some general kind of algorithm and feed it with langage specific data (which would certainly be easier for 50+ languages KDE supports). Specifically I mean n-gram based text categorisation. It is primarily meant for language guessing but IMO would also work quite well for encoding guessing. In the past I made a few tests using textcat for "Polish" encodings ISO 8859-2 and CP 1250 and (as far as I can remember) it proved to be useful. Tomek >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<