> > ... but I'd > > like to have something like fuzzy search. Problem is to find a > > algorithm that defines if two words are "about the same" _indepent > > of the language_ and it would most likely work on all words. So if > > you do fuzzy a search on "play CD" I'd expect it to find "CD > > player", but that would require to go through all words in our > > documents (of course indexed) and look if it's similiar to "play" > > or to "CD". A language independent similarity approach is the Levenshtein algorithm, also dubbed as Edit Distance. It measures the number of letter additions, deletions and replacements to come from one word to another word. Because it works on letters, it doesn't care for the language --- it's therefore much better then the often referenced Soundex algorithm, which is really crap for languages <> english. I once used that algorithm to make a "fuzzy receiver matching" tool for sendmail (http://home.nikocity.de/hschurig/similarreceiver.html). However, that would need really to check your entered word against the all existing words in the database (not necessarily in all the documents, most search engines have a word index anyway).