[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: How to make word-N-gram based query and interpolate each N-gram score to obtain final Lucene score
From: Rajen Chatterjee <rajen.k.chatterjee () gmail ! com>
Date: 2016-01-11 8:43:42
Message-ID: CAC4-+NxFLTZqh59wpk3xFvo5g8LWjjNKKJ01o3SYua=tupKkKg () mail ! gmail ! com
[Download RAW message or body]
Hello Everyone,
I am looking for some method which can help me to build *word-N-gram* based
queries.
After doing some search I think that I have to define an analyzer as
follows:
public static Analyzer wordNgramAnalyzer(final int minShingle, final int
maxShingle) {
return new Analyzer() {
@Override
public TokenStream tokenStream(String fieldName, Reader reader)
{
return new ShingleFilter(new WhitespaceTokenizer(reader),
minShingle, maxShingle)
}
};
}
This analyzer will help to get unigram, bigram, trigram,... tokens, which I
can use during indexing as well as at the query time.
So, can anyone please tell me:
1) Is this the right approach to index and query word-N-gram?
2) Is there any way to set weights to the N-grams, like at the query time
tri-gram based tokens should have higher weight than an uni-gram based token
(something like the final lucene score should be interpolation of uni-gram
score, bi-gram score, tri-gram score,... and so on)
Any help is much appreciated.
Thanks
--
-Regards,
Rajen Chatterjee.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic