[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-dev
Subject: [jira] Created: (LUCENE-1320) ShingleMatrixFilter, a three
From: "Karl Wettin (JIRA)" <jira () apache ! org>
Date: 2008-06-30 2:01:45
Message-ID: 1103006763.1214791305011.JavaMail.jira () brutus
[Download RAW message or body]
ShingleMatrixFilter, a three dimensional permutating shingle filter
-------------------------------------------------------------------
Key: LUCENE-1320
URL: https://issues.apache.org/jira/browse/LUCENE-1320
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/analyzers
Affects Versions: 2.3.2
Reporter: Karl Wettin
Assignee: Karl Wettin
Backed by a column focused matrix that creates all permutations of shingle tokens in \
three dimensions. I.e. it handles multi token synonyms.
Could for instance in some cases be used to replaces 0-slop phrase queries with \
something speedier.
{code:java}
Token[][][]{
{{hello}, {greetings, and, salutations}},
{{world}, {earth}, {tellus}}
}
{code}
passes the following test with 2-3 grams:
{code:java}
assertNext(ts, "hello_world");
assertNext(ts, "greetings_and");
assertNext(ts, "greetings_and_salutations");
assertNext(ts, "and_salutations");
assertNext(ts, "and_salutations_world");
assertNext(ts, "salutations_world");
assertNext(ts, "hello_earth");
assertNext(ts, "and_salutations_earth");
assertNext(ts, "salutations_earth");
assertNext(ts, "hello_tellus");
assertNext(ts, "and_salutations_tellus");
assertNext(ts, "salutations_tellus");
{code}
Contains more and less complex tests that demonstrate offsets, posincr, payload \
boosts calculation and construction of a matrix from a token stream.
The matrix attempts to hog as little memory as possible by seeking no more than \
maximumShingleSize columns forward in the stream and clearing up unused resources \
(columns and unique token sets). Can still be optimized quite a bit though.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic