[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: Re: Query text Tokenize issue
From: Erik Hatcher <erik () ehatchersolutions ! com>
Date: 2005-07-28 9:02:44
Message-ID: 8502390A-59AC-4461-A800-FC2DCEA21073 () ehatchersolutions ! com
[Download RAW message or body]
On Jul 27, 2005, at 7:26 PM, Indu Abeyaratna wrote:
>
> I have a field index as keyword. And have two records "J400-C-V1-
> S10-T1" and
> "J400-C-V-S10-T1"
>
> When I search for "J400-C-V1-S10-T1", it returns me matching
> record, but
> when I Search for "J400-C-V-S10-T1" it doesn't return the matching
> one.
>
> Further I found that "J400-C-V-S10-T1" is incorrectly tokenised to
> "J400-C"
> and "V-S10-T1" but nothing like that happened to "J400-C-V1-S10-T1".
>
> This happens when there is combination like "?-?-" and its get
> tokenised
> into "?" and "?-".
>
> I attached test case for further clarification.
>
> I am using StandardAnalyser and query parser.
>
> Is this a bug in the lucene or JavaCC?? Or am I missing something
> here? any
> suggestion to get away with this?
It's not a "bug" per se.... but rather just how StandardAnalyzer
works. StandardAnalyzer is a general-purpose text analyzer, and
cannot reasonably deal with this issue and also deal with the much
more common scenario of "hyphenated-text" that should be split into
separate tokens.
As Otis said, this is really the job for a custom analyzer.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic