[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-dev
Subject: Re: New "Did you mean" feature: How to approach?
From: Dave Spencer <dave-lucene-dev () tropo ! com>
Date: 2005-03-29 16:31:20
Message-ID: 424982D8.50205 () tropo ! com
[Download RAW message or body]
Dave Spencer wrote:
> Otis Gospodnetic wrote:
>
>> Maybe the spellchecker at the bottom of the following URL will help:
>>
>> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/
>>
>>
>
> Yeah, I did this, the "ngram based spelling corrector".
>
> You build a normal lucene index as you always do
> then run NGramSpeller, analyzes your index to determine which ngrams
> are used, and saves this in a separate Lucene index
> then you call NGramSpeller.suggestUsingNGrams() if a users query
> doesn't return too many results
>
> weblog entry here w/ more info and a test page:
>
> http://www.searchmorph.com/weblog/index.php?id=23
Oh and if not obvious from the above, the code is in use live.
I searchmorph has a search engine of javadoc pages.
Here I search for "hashmep" (intending 'hashmap')
http://www.searchmorph.com/kat/search.jsp?s=hashmep
See the suggestions after the text "I cannot find hashmep anywhere.
Instead try these variations..." and note that it read my mind :) and
hashmap is the first suggestion.
>
> --
>
> Some chance you'll be instested in the "more like this" similarity
> query generator - see the "similar" tree in the sandbox
>
> -- Dave
>
>> Otis
>>
>>
>> --- "Stefan F. Keller" <sfkeller@gmail.com> wrote:
>>
>>
>>> We would like to add "Did you mean..." to our Lucene-based search
>>> engine www.geometa.info. Doug mentioned in his recent interview that
>>> this feature would be not too complicated to implement.
>>>
>>> First I considered integrating a spelling checker (through JADT-API)
>>> but one would rather expect "nearby" words which really exist in the
>>> document pool. Some people have mentioned this feature here (or on
>>> the
>>> java-user-list).
>>>
>>> => Is anyone aware of any real developments in this area?
>>> Ideally, one would combine the data already maintained by the
>>> IndexReader class with an existing similarity search algorithm (like
>>> trigram)...
>>>
>>> => Any ideas?
>>>
>>> Stefan
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic