[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-dev
Subject: [jira] Commented: (LUCENE-2183) Supplementary Character Handling in
From: "Robert Muir (JIRA)" <jira () apache ! org>
Date: 2009-12-30 4:59:29
Message-ID: 103583935.1262149169441.JavaMail.jira () brutus ! apache ! org
[Download RAW message or body]
[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin \
.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795235#action_12795235 ]
Robert Muir commented on LUCENE-2183:
-------------------------------------
Simon, I don't think your example is a problem.
I am proposing my original design, with no reflection, driven by Version only.
There is only one exception where reflection is used... that is during ctor to \
determine if:
* you subclass a tokenizer that implements int-based methods
* you have only implemented char-based methods
* you request VERSION >= 3.1
in this case, the reflection is only used in the ctor to throw UOE!
if someone wants to support VERSION 3.1 in their app, they simply implement the \
int-based methods. to support lower versions, they do nothing, they do not need to \
implement char-based methods, they get the backwards compat automatically, as long as \
they supply the correct version. this is guaranteed by CharacterUtils.
I am only proposing using reflection to enforce the throwing of UOE, in the case that \
someone requests VERSION 3.1, but has not implemented int.
if they want to support Version <= 3.1, this is fine, it will work with their \
char-based stuff automatically.
I think it would be easiest if i modified your patch to illustrate this, so i'll do \
it in a few days.
> Supplementary Character Handling in CharTokenizer
> -------------------------------------------------
>
> Key: LUCENE-2183
> URL: https://issues.apache.org/jira/browse/LUCENE-2183
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Reporter: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2183.patch
>
>
> CharTokenizer is an abstract base class for all Tokenizers operating on a character \
> level. Yet, those tokenizers still use char primitives instead of int codepoints. \
> CharTokenizer should operate on codepoints and preserve bw compatibility.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic