[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [jira] Issue Comment Edited: (LUCENE-2943) ICU collator
From:       "Uwe Schindler (JIRA)" <jira () apache ! org>
Date:       2011-02-28 21:08:37
Message-ID: 1292166866.2862.1298927317541.JavaMail.tomcat () hel ! zones ! apache ! org
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/LUCENE-2943?page=com.atlassian.jira.plugin \
.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000511#comment-13000511 ] 

Uwe Schindler edited comment on LUCENE-2943 at 2/28/11 9:07 PM:
----------------------------------------------------------------

I changed my mind a little bit:

The cloning of the Collator should be done in the Analyzer not in the Filter. The \
same applies to the AttributeImpl, the cloning should not be done in the ctor. The \
problem is not that the TokenStream or the Attribute instance may reuse the attribute \
in different threads, the problem is that the factory class (the Analyzer) does reuse \
the Collator in different threads when it produces multiple tokenstreams or the AF \
multiple attributes.

This is a slight difference, because the following code is always safe:
new CollationFilter(Collator.newInstance(lang)), cloning would be wrong.

The reason for the whole thing: TokenStream and Attribute instances itsself are \
single-threaded only, but not the factory or the analyzer.

      was (Author: thetaphi):
    I changed my mind a little bit:

The cloning of the Filter should be done in the Analyzer not in the Filter. The same \
applies to the AttributeImpl, the cloning should be done in the ctor. The problem is \
not that the TokenStream or the Attribute instance may reuse the attribute in \
different threads, the problem is that the factory class (the Analyzer) does reuse \
the Collator in different threads when it produces multiple tokenstreams or the AF \
multiple attributes.

This is a slight difference, because the following code is always safe:
new CollationFilter(Collator.newInstance(lang)), cloning would be wrong.

The reason for the whole thing: TokenStream and Attribute instances itsself are \
single-threaded only, but not the factory or the analyzer.  
> ICU collator thread-safety issues
> ---------------------------------
> 
> Key: LUCENE-2943
> URL: https://issues.apache.org/jira/browse/LUCENE-2943
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
> 
> Attachments: LUCENE-2943.patch
> 
> 
> The ICU Collators (unlike the JDK ones) aren't thread safe: \
> http://userguide.icu-project.org/collation/architecture , a little non-obvious \
> since its not mentioned in the javadocs, and its not clear if the docs apply to \
> only the C code, but i looked at the source and there is all kinds of internal \
> state. So in my opinion, we should clone the icu collators (which are passed in \
> from the outside)  when creating a new TokenStream/AttributeImpl to prevent \
> problems. This shouldn't be a big deal since everything uses reusableTokenStream \
> anyway.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic