[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: KHelpCenter + htdig
From:       Stephan Kulow <coolo () kde ! org>
Date:       2001-05-16 22:02:38
[Download RAW message or body]

On Wednesday 16 May 2001 22:29, Holger Schurig wrote:
> > > ... but I'd
> > > like to have something like fuzzy search. Problem is to find a
> > > algorithm that defines if two words are "about the same" _indepent
> > > of the language_ and it would most likely work on all words. So if
> > > you do fuzzy a search on "play CD" I'd expect it to find "CD
> > > player", but that would require to go through all words in our
> > > documents (of course indexed) and look if it's similiar to "play"
> > > or to "CD".
>
> A language independent similarity approach is the Levenshtein
> algorithm, also dubbed as Edit Distance. It measures the number of
> letter additions, deletions and replacements to come from one word to
> another word. Because it works on letters, it doesn't care for the
> language --- it's therefore much better then the often referenced
> Soundex algorithm, which is really crap for languages <> english.
>
OK, I got a crash course of text searching (thanks for the input to anyone 
writing me :) today and I'm quite sure what I want now. I just need the time 
to implement it :)

Greetings, Stephan

-- 
People in cars cause accidents. Accidents in cars cause people.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic