[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    AW: "Umlaute" getting lost
From:       Clemens Wyss <clemensdev () mysign ! ch>
Date:       2011-04-26 6:15:46
Message-ID: E594BA962D832C49A3CF858DAA3A696C1135A51784 () Exchange2007 ! mysigndomain ! corp
[Download RAW message or body]

> Out of curiosity, what is the problem you are trying to solve?
I am trying to provide suggestions for search terms/word, such as google does. When \
the user starts typing the search term, I look up my TermIndex to provide possible \
search terms which fit the characters provided...

Thx
Clemens

> -----Ursprüngliche Nachricht-----
> Von: Grant Ingersoll [mailto:gsingers@apache.org]
> Gesendet: Sonntag, 24. April 2011 08:30
> An: java-user@lucene.apache.org
> Betreff: Re: "Umlaute" getting lost
> 
> 
> On Apr 21, 2011, at 5:02 PM, Clemens Wyss wrote:
> 
> > I keep my search terms in a dedicated RAMDirectory (the termIndex).
> > In there I palce all the term of my real index. When putting the terms
> > into the termIndex I can still see [using the debugger] the Umlaute
> > (äöü). Unfortunately when searching the termIndex the documents no
> more contain these Umlaute.
> > 
> > Populating the termIndex:
> > termIndex = new RAMDirectory();
> > IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_31,
> > new TermAnalyzer( locale ) ); termIndexWriter = new IndexWriter(
> > termIndex, config ); TermEnum tEnum = realIndexReader.terms(); while (
> > tEnum.next() ) {
> > 	Term t = tEnum.term();
> > 	String termText = t.text();
> > 	Document termDocument = new Document();
> > 	Field field = new Field( FIELDNAME_TERM, termText, Field.Store.YES,
> Field.Index.ANALYZED );
> > 	termDocument.add( field );
> > 	// and add term into the index
> > 	termIndexWriter.addDocument( termDocument ); }
> > termIndexWriter.commit(); termIndexWriter.optimize();
> > termIndexWriter.close();
> > 
> > termIndexReader = IndexReader.open( termIndex, true );
> > ---------- searching terms
> > Query q = fuzzy ? new FuzzyQuery( new Term( FIELDNAME_TERM,
> termFilter.toLowerCase() ) ) :
> > 					new WildcardQuery( new Term(
> FIELDNAME_TERM, "*" + termFilter.toLowerCase() + "*" ) );
> > TopDocs topDocs = new IndexSearcher( getTermIndexReader() ).search( q,
> 100 );
> > for ( ScoreDoc hit : topDocs.scoreDocs ) {
> > 	Document doc = getTermIndexReader().document( hit.doc );
> > 	String indexTerm = doc.get( FIELDNAME_TERM );
> > 	if ( !returnValue.contains( indexTerm  ) )
> > 	{
> > 		returnValue.add( indexTerm );
> > 	}
> > }
> > ----------
> > The TermAbnalyzer is the same analyzer as the main index analyzer with
> the exception that a LowerCaseFilter is applied.
> 
> What is the Analyzer for the Main Index?  What is the tokenizer and token
> filters used?
> 
> Out of curiosity, what is the problem you are trying to solve?
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic