[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-dev
Subject: analyzer refactoring
From: "Rasik Pandey" <rasik.pandey () ajlsm ! com>
Date: 2004-06-21 17:48:58
Message-ID: 000301c457b8$0ede1190$ac7ba8c0 () diderot
[Download RAW message or body]
Hello,
As mentioned in previous exchanges, notably with Grant Ingersoll, I added some new \
classes to the "analysis" package to meet the requirements of the feature request in \
Bugzilla (http://issues.apache.org/bugzilla/show_bug.cgi?id=28182) and did some \
refactoring while I was under-the-hood. This is an overview of the hierarchies per my \
changes:
-Analyzer
--CustomAnalyzer (new abstract class largely based on Grant's BaseAnalyzer)
--AbstractAnalyzer (new abstract class)
---RussianAnalyzer
---GermanAnalyzer
---etc.
-Tokenizer
--CloneableTokenizer (new abstract class)
---StandardTokenizer
---CharTokenizer
---CJKTokenizer
---etc.
-TokenFilter
--CloneableTokenFilter (new abstract class)
---AbstractStemFilter (new abstract class)
----RussianStemFilter
----GermanStemFilter
----etc.
-Stemmer (very simple new interface used in AbstractStemFilter)
--PorterStemmer
--RussianStemmer
--etc.
In the attached zip file there are 3 diff files (core.analysis, sandbox.analysis, and \
sandbox.analysis.snowball) and a zip containing the new classes for \
org.apache.lucene.analysis in the lucene core. I tried to minimize the irrelevant \
code changes (e.g. style, spaces, etc.) in the diffs while conforming to the code \
formatting guidelines outlined by Otis. I think there were a number of classes in the \
"analysis" package that didn't conform so these diffs may have a lot of noise as I \
reformatted those classe with my IDE, sorry :( . If the diffs are too painful then \
let me know and I'll try to prune them.
If there is a TODO list specific to Analyzers, are the below items on that list?
1) move German and Russian packages to sandbox (I think this is on the Lucene TODO \
list) 2) Analyzer class renaming such that dynamic configuration could return classes \
like Analyzer_ru, Analyzer_de, Analyzer_fr, etc. based on the class naming scheme \
"Analyzer_{Locale.toString}" 3) Documentation
Question, comments, feedback, criticisms are all welcome......
Regards,
RBP
PS - Thanks Grant!
["analysis.zip" (application/x-zip-compressed)]
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic