[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [jira] Commented: (LUCENE-725) NovelAnalyzer - wraps your choice of
From:       "Karl Wettin (JIRA)" <jira () apache ! org>
Date:       2008-05-31 10:37:45
Message-ID: 173858065.1212230265081.JavaMail.jira () brutus
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/LUCENE-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601373#action_12601373 \
] 

Karl Wettin commented on LUCENE-725:
------------------------------------

If you hang on for a week I too will be taking a closer look at this code.

http://www.nabble.com/Clustering-Demo-tt17127240.html#a17449440


> NovelAnalyzer - wraps your choice of Lucene Analyzer and filters out all \
>                 "boilerplate" text
> -------------------------------------------------------------------------------------------
>  
> Key: LUCENE-725
> URL: https://issues.apache.org/jira/browse/LUCENE-725
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Analysis
> Reporter: Mark Harwood
> Assignee: Otis Gospodnetic
> Priority: Minor
> Attachments: NovelAnalyzer.java, NovelAnalyzer.java
> 
> 
> This is a class I have found to be useful for analyzing small (in the hundreds) \
> collections of documents and  removing any duplicate content such as standard \
> disclaimers or repeated text in an exchange of  emails. This has applications in \
> sampling query results to identify key phrases, improving speed-reading of results \
> with similar content (eg email threads/forum messages) or just removing duplicated \
> noise from a search index. To be more generally useful it needs to scale to \
> millions of documents - in which case an alternative implementation is required. \
> See the notes in the Javadocs for this class for more discussion on this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic