'[jira] Commented: (LUCENE-626) Adaptive, user query session'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [jira] Commented: (LUCENE-626) Adaptive, user query session
From:       "Karl Wettin (JIRA)" <jira () apache ! org>
Date:       2007-01-30 23:40:34
Message-ID: 28012372.1170200434010.JavaMail.jira () brutus
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468825 \
] 

Karl Wettin commented on LUCENE-626:
------------------------------------




I've been running this version live today. Suggests great stuff all  
the time. It is however a bit RAM hogging just as everything else I  
do. Think I'll add some sort of external persistency to handle that  
(probably BDB), backed by a soft referenced cache.

There is a problem with the adaptive layer not adapting to (correct)  
suggestions with large edit distance supplied by the multi word/term  
position vector layer on top of the ngram spell checker. E.g. "magic  
might heros" -> "heroes might magic". 


> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
> 
> Key: LUCENE-626
> URL: https://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellchecker.diff
> 
> 
> From javadocs:
> This is an adaptive, user query session analyzing spell checker. In plain words, a \
> word and phrase dictionary that will learn from how users act while searching. Be \
> aware, this is a beta version. It is not finished, but yeilds great results if you \
> have enough user activity, RAM and a faily narrow document corpus. The RAM problem \
> can be fixed if you implement your own subclass of SpellChecker as the abstract \
> methods of this class are the CRUD methods. This will most probably change to a \
> strategy class in future version. TODO:
> 1. Gram up results to detect compositewords that should not be composite words, and \
> vice verse. 2. Train a gramed token (markov) chain with output from an expectation \
> maximization algorithm (weka clusters?) parallel to a closest path (A* or bredth \
> first?) to allow contextual suggestions on queries that never was placed. Usage:
> Training
> At user query time, create an instance of QueryResults containg the query string, \
> number of hits and a time stamp. Add it to a chronologically ordered list in the \
> user session (LinkedList makes sense) that you pass on to train(sessionQueries) as \
> the session times out. You also want to call the bootstrap() method every 100000 \
> queries or so. Spell checking
> Call getSuggestions(query) and look at the results. Don't modify it! This method \
> call will be hidden in a facade in future version. Note that the spell checker is \
> case sensitive, so you want to clean up query the same way when you train as when \
> you request the suggestions. I recommend something like query = \
> query.toLowerCase().replaceAll(" ", " ").trim() 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic