'[jira] Commented: (LUCENE-2183) Supplementary Character Handling in'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [jira] Commented: (LUCENE-2183) Supplementary Character Handling in
From:       "Robert Muir (JIRA)" <jira () apache ! org>
Date:       2009-12-30 4:59:29
Message-ID: 103583935.1262149169441.JavaMail.jira () brutus ! apache ! org
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin \
.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795235#action_12795235 ] 

Robert Muir commented on LUCENE-2183:
-------------------------------------

Simon, I don't think your example is a problem.

I am proposing my original design, with no reflection, driven by Version only.

There is only one exception where reflection is used... that is during ctor to \
                determine if:
* you subclass a tokenizer that implements int-based methods
* you have only implemented char-based methods
* you request VERSION >= 3.1

in this case, the reflection is only used in the ctor to throw UOE!

if someone wants to support VERSION 3.1 in their app, they simply implement the \
int-based methods. to support lower versions, they do nothing, they do not need to \
implement char-based methods, they get the backwards compat automatically, as long as \
they supply the correct version. this is guaranteed by CharacterUtils.

I am only proposing using reflection to enforce the throwing of UOE, in the case that \
someone requests VERSION 3.1, but has not implemented int.

if they want to support Version <= 3.1, this is fine, it will work with their \
char-based stuff automatically.

I think it would be easiest if i modified your patch to illustrate this, so i'll do \
it in a few days.


> Supplementary Character Handling in CharTokenizer
> -------------------------------------------------
> 
> Key: LUCENE-2183
> URL: https://issues.apache.org/jira/browse/LUCENE-2183
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Reporter: Simon Willnauer
> Fix For: 3.1
> 
> Attachments: LUCENE-2183.patch
> 
> 
> CharTokenizer is an abstract base class for all Tokenizers operating on a character \
> level. Yet, those tokenizers still use char primitives instead of int codepoints. \
> CharTokenizer should operate on codepoints and preserve bw compatibility. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic