[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-dev
Subject: [jira] Commented: (LUCENE-626) Adaptive, user query session
From: "Karl Wettin (JIRA)" <jira () apache ! org>
Date: 2007-01-30 23:40:34
Message-ID: 28012372.1170200434010.JavaMail.jira () brutus
[Download RAW message or body]
[ https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468825 \
]
Karl Wettin commented on LUCENE-626:
------------------------------------
I've been running this version live today. Suggests great stuff all
the time. It is however a bit RAM hogging just as everything else I
do. Think I'll add some sort of external persistency to handle that
(probably BDB), backed by a soft referenced cache.
There is a problem with the adaptive layer not adapting to (correct)
suggestions with large edit distance supplied by the multi word/term
position vector layer on top of the ngram spell checker. E.g. "magic
might heros" -> "heroes might magic".
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: https://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellchecker.diff
>
>
> From javadocs:
> This is an adaptive, user query session analyzing spell checker. In plain words, a \
> word and phrase dictionary that will learn from how users act while searching. Be \
> aware, this is a beta version. It is not finished, but yeilds great results if you \
> have enough user activity, RAM and a faily narrow document corpus. The RAM problem \
> can be fixed if you implement your own subclass of SpellChecker as the abstract \
> methods of this class are the CRUD methods. This will most probably change to a \
> strategy class in future version. TODO:
> 1. Gram up results to detect compositewords that should not be composite words, and \
> vice verse. 2. Train a gramed token (markov) chain with output from an expectation \
> maximization algorithm (weka clusters?) parallel to a closest path (A* or bredth \
> first?) to allow contextual suggestions on queries that never was placed. Usage:
> Training
> At user query time, create an instance of QueryResults containg the query string, \
> number of hits and a time stamp. Add it to a chronologically ordered list in the \
> user session (LinkedList makes sense) that you pass on to train(sessionQueries) as \
> the session times out. You also want to call the bootstrap() method every 100000 \
> queries or so. Spell checking
> Call getSuggestions(query) and look at the results. Don't modify it! This method \
> call will be hidden in a facade in future version. Note that the spell checker is \
> case sensitive, so you want to clean up query the same way when you train as when \
> you request the suggestions. I recommend something like query = \
> query.toLowerCase().replaceAll(" ", " ").trim()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic