[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: Textfile classification (encoding, languages etc.)
From:       Malte Starostik <malte () kde ! org>
Date:       2003-09-25 19:54:31
[Download RAW message or body]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday 25 September 2003 21:42, Zack Rusin wrote:
> On Thursday 25 September 2003 15:06, Malte Starostik wrote:
> > PS: any comments on making KSpell use libaspell or pspell instead of
> > an external process if available?
>
> Oh, yeah, I'll be rewriting it once I'll get some more time. Laurent
> wrote kospell which kind of does this but keeps the KSpell api and
> makes creating new backends rather a pain. I like Enchant, but I'm
> still not too keen on the Glib dependency. I like how instead of using
> the ispell process they simply wrote it as a library and are using it.
> We should do the same so that instead of using kprocess we use the
> libraries directly.
> So, we might meet on irc or start a discussion at some point and decide
> whether we want to write a completely new implementation - we have
> enough of use cases and after spending too much time with kspell and
> other spell checkers I know what's needed so I'd vote for that. We can
> also use Enchant. The problem with that is that we would have to write
> our frontend to it anyway, which would pretty much end up with #1 but
> witch Enchant as the only backend.
> But anyway, what algorithm are you using to detect the languages? Is it
> regexp based or is something more fun? You definitely got my full
> attention.

I didn't know Enchant, looks interesting, provided our frontend to the 
frontend would stay reasonably small.
I've based the implementation on the Linuga::Ident perl module which uses tri- 
and bigrams. "Based on" means a bit more than a plain perl-C++ translation 
and a bit less than a complete rewrite. It's damn small but reliable.

- -Malte
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/c0f6VDF3RdLzx4cRAgq4AJ923CAnhc2Yke13iUXdiEWXLrwtzwCghPRg
lXMjryIthxJ3CQikmznFEyI=
=g1BP
-----END PGP SIGNATURE-----

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic