[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-dev
Subject: Re: Korean character set in analysis
From: "Yiyi Sun" <yiyisun () yahoo ! com>
Date: 2001-11-29 12:47:41
[Download RAW message or body]
Hi,
I have made 3 classes for the Simplified Chinese. They are ChineseAnalyzer,
ChineseFilter and ChineseTokenizer. If you are interesting in those, I can
upload.
You can use a dictionary to extract nouns from the sentences.
Cheers!
Yiyi Sun
----- Original Message -----
From: "Junshik, Jeon" <locus@nextel.co.kr>
To: <lucene-dev@jakarta.apache.org>
Sent: Thursday, November 29, 2001 12:15 AM
Subject: Korean character set in analysis
> Hello,
>
> I've been testing lucene indexing and searching for Korean Language
documents.
> But, currently not support korean character set...
>
> So, I've changed some codes to work with korean character set.
>
>
> in
"jakarta-lucene\src\java\org\apache\lucene\analysis\standard\StandardTokeniz
er.jj" file.
>
> JavaCC option part..
> -----------------------------------------------------------------
> options {
> STATIC = false;
> //IGNORE_CASE = true;
> //BUILD_PARSER = false;
> UNICODE_INPUT = true; // <== changes : uncomment for korean character
set
> USER_CHAR_STREAM = true;
> OPTIMIZE_TOKEN_MANAGER = true;
> //DEBUG_TOKEN_MANAGER = true;
> }
>
> in TOKEN
> -----------------------------------------------------------------
> | < #LETTER: // unicode letters
> [
> "\u0041"-"\u005a",
> "\u0061"-"\u007a",
> "\u00c0"-"\u00d6",
> "\u00d8"-"\u00f6",
> "\u00f8"-"\u00ff",
> "\u0100"-"\u1fff",
> "\u3040"-"\u318f",
> "\u3300"-"\u337f",
> "\u3400"-"\u3d2d",
> "\u4e00"-"\u9fff",
> "\uac00"-"\ud7a3", // <== changes : added.. ( korean character
set in UNICODE )
> "\uf900"-"\ufaff"
> ]
> >
>
> I hope these changes are added to CVS repository..
>
>
> Another question is how to analysis compound words.
>
> Compound word consist of nouns. I want to index, every nouns in compounds
word after analysis.
> but current TokenStream class has only "public Token next()" method.
>
> If you could let me know how to solve it?
>
> Regards,
>
> Junshik, Jeon (locus@nextel.co.kr)
>
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic