[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    analyzer refactoring
From:       "Rasik Pandey" <rasik.pandey () ajlsm ! com>
Date:       2004-06-21 17:48:58
Message-ID: 000301c457b8$0ede1190$ac7ba8c0 () diderot
[Download RAW message or body]

Hello,

As mentioned in previous exchanges, notably with Grant Ingersoll, I added some new \
classes to the "analysis" package to meet the requirements of the feature request in \
Bugzilla (http://issues.apache.org/bugzilla/show_bug.cgi?id=28182) and did some \
refactoring while I was under-the-hood. This is an overview of the hierarchies per my \
changes:

-Analyzer
--CustomAnalyzer (new abstract class largely based on Grant's BaseAnalyzer)
--AbstractAnalyzer (new abstract class)
---RussianAnalyzer
---GermanAnalyzer
---etc.

-Tokenizer
--CloneableTokenizer (new abstract class)
---StandardTokenizer
---CharTokenizer
---CJKTokenizer
---etc.

-TokenFilter
--CloneableTokenFilter (new abstract class)
---AbstractStemFilter (new abstract class)
----RussianStemFilter
----GermanStemFilter
----etc.

-Stemmer (very simple new interface used in AbstractStemFilter)
--PorterStemmer
--RussianStemmer
--etc.

In the attached zip file there are 3 diff files (core.analysis, sandbox.analysis, and \
sandbox.analysis.snowball) and a zip containing the new classes for \
org.apache.lucene.analysis in the lucene core. I tried to minimize the irrelevant \
code changes (e.g. style, spaces, etc.) in the diffs while conforming to the code \
formatting guidelines outlined by Otis. I think there were a number of classes in the \
"analysis" package that didn't conform so these diffs may have a lot of noise as I \
reformatted those classe with my IDE, sorry :( . If the diffs are too painful then \
let me know and I'll try to prune them. 

If there is a TODO list specific to Analyzers, are the below items on that list?

1) move German and Russian packages to sandbox (I think this is on the Lucene TODO \
list) 2) Analyzer class renaming such that dynamic configuration could return classes \
like Analyzer_ru, Analyzer_de, Analyzer_fr, etc. based on the class naming scheme \
"Analyzer_{Locale.toString}" 3) Documentation

Question, comments, feedback, criticisms are all welcome......

Regards,
RBP

PS - Thanks Grant!


["analysis.zip" (application/x-zip-compressed)]

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic