[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [jira] Updated: (LUCENE-1320) ShingleMatrixFilter, a three
From:       "Karl Wettin (JIRA)" <jira () apache ! org>
Date:       2008-06-30 2:11:45
Message-ID: 652227864.1214791905029.JavaMail.jira () brutus
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Karl Wettin updated LUCENE-1320:
--------------------------------

    Attachment: LUCENE-1320.txt

documentation will have to come later... until then see the test cases

> ShingleMatrixFilter, a three dimensional permutating shingle filter
> -------------------------------------------------------------------
> 
> Key: LUCENE-1320
> URL: https://issues.apache.org/jira/browse/LUCENE-1320
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/analyzers
> Affects Versions: 2.3.2
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Attachments: LUCENE-1320.txt
> 
> 
> Backed by a column focused matrix that creates all permutations of shingle tokens \
> in three dimensions. I.e. it handles multi token synonyms. Could for instance in \
> some cases be used to replaces 0-slop phrase queries with something speedier. \
> {code:java} Token[][][]{
> {{hello}, {greetings, and, salutations}},
> {{world}, {earth}, {tellus}}
> }
> {code}
> passes the following test  with 2-3 grams:
> {code:java}
> assertNext(ts, "hello_world");
> assertNext(ts, "greetings_and");
> assertNext(ts, "greetings_and_salutations");
> assertNext(ts, "and_salutations");
> assertNext(ts, "and_salutations_world");
> assertNext(ts, "salutations_world");
> assertNext(ts, "hello_earth");
> assertNext(ts, "and_salutations_earth");
> assertNext(ts, "salutations_earth");
> assertNext(ts, "hello_tellus");
> assertNext(ts, "and_salutations_tellus");
> assertNext(ts, "salutations_tellus");
> {code}
> Contains more and less complex tests that demonstrate offsets, posincr, payload \
> boosts calculation and construction of a matrix from a token stream. The matrix \
> attempts to hog as little memory as possible by seeking no more than \
> maximumShingleSize columns forward in the stream and clearing up unused resources \
> (columns and unique token sets). Can still be optimized quite a bit though.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic