[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    Re: New "Did you mean" feature: How to approach?
From:       Dave Spencer <dave-lucene-dev () tropo ! com>
Date:       2005-03-29 16:31:20
Message-ID: 424982D8.50205 () tropo ! com
[Download RAW message or body]

Dave Spencer wrote:

> Otis Gospodnetic wrote:
>
>> Maybe the spellchecker at the bottom of the following URL will help:
>>
>>  http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/
>>  
>>
>
> Yeah, I did this, the "ngram based spelling corrector".
>
> You build a normal lucene index as you always do
> then run NGramSpeller, analyzes your index to determine which ngrams 
> are used, and saves this in a separate Lucene index
> then you call NGramSpeller.suggestUsingNGrams()  if  a users query 
> doesn't return too many results
>
> weblog entry here w/ more info and a test page:
>
> http://www.searchmorph.com/weblog/index.php?id=23


Oh and if not obvious from the above, the code is in use live.
I searchmorph has a search engine of javadoc pages.
Here I search for "hashmep" (intending 'hashmap')

http://www.searchmorph.com/kat/search.jsp?s=hashmep

See the suggestions after the text "I cannot find hashmep anywhere. 
Instead try these variations..." and note that it read my mind :) and 
hashmap is the first suggestion.


>
> -- 
>
> Some chance you'll be instested in the "more like this" similarity 
> query generator - see the "similar" tree in the sandbox
>
> -- Dave
>
>> Otis
>>
>>
>> --- "Stefan F. Keller" <sfkeller@gmail.com> wrote:
>>  
>>
>>> We would like to add "Did you mean..." to our Lucene-based search
>>> engine www.geometa.info. Doug mentioned in his recent interview that
>>> this feature would be not too complicated to implement.
>>>
>>> First I considered integrating a spelling checker (through JADT-API)
>>> but one would rather expect "nearby" words which really exist in the
>>> document pool. Some people have mentioned this feature here (or on
>>> the
>>> java-user-list).
>>>
>>> => Is anyone aware of any real developments in this area?
>>> Ideally, one would combine the data already maintained by the
>>> IndexReader class with an existing similarity search algorithm (like
>>> trigram)...
>>>
>>> => Any ideas?
>>>
>>> Stefan
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>   
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>  
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic