[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: AW: "Umlaute" getting lost
From: Clemens Wyss <clemensdev () mysign ! ch>
Date: 2011-04-26 6:15:46
Message-ID: E594BA962D832C49A3CF858DAA3A696C1135A51784 () Exchange2007 ! mysigndomain ! corp
[Download RAW message or body]
> Out of curiosity, what is the problem you are trying to solve?
I am trying to provide suggestions for search terms/word, such as google does. When \
the user starts typing the search term, I look up my TermIndex to provide possible \
search terms which fit the characters provided...
Thx
Clemens
> -----Ursprüngliche Nachricht-----
> Von: Grant Ingersoll [mailto:gsingers@apache.org]
> Gesendet: Sonntag, 24. April 2011 08:30
> An: java-user@lucene.apache.org
> Betreff: Re: "Umlaute" getting lost
>
>
> On Apr 21, 2011, at 5:02 PM, Clemens Wyss wrote:
>
> > I keep my search terms in a dedicated RAMDirectory (the termIndex).
> > In there I palce all the term of my real index. When putting the terms
> > into the termIndex I can still see [using the debugger] the Umlaute
> > (äöü). Unfortunately when searching the termIndex the documents no
> more contain these Umlaute.
> >
> > Populating the termIndex:
> > termIndex = new RAMDirectory();
> > IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_31,
> > new TermAnalyzer( locale ) ); termIndexWriter = new IndexWriter(
> > termIndex, config ); TermEnum tEnum = realIndexReader.terms(); while (
> > tEnum.next() ) {
> > Term t = tEnum.term();
> > String termText = t.text();
> > Document termDocument = new Document();
> > Field field = new Field( FIELDNAME_TERM, termText, Field.Store.YES,
> Field.Index.ANALYZED );
> > termDocument.add( field );
> > // and add term into the index
> > termIndexWriter.addDocument( termDocument ); }
> > termIndexWriter.commit(); termIndexWriter.optimize();
> > termIndexWriter.close();
> >
> > termIndexReader = IndexReader.open( termIndex, true );
> > ---------- searching terms
> > Query q = fuzzy ? new FuzzyQuery( new Term( FIELDNAME_TERM,
> termFilter.toLowerCase() ) ) :
> > new WildcardQuery( new Term(
> FIELDNAME_TERM, "*" + termFilter.toLowerCase() + "*" ) );
> > TopDocs topDocs = new IndexSearcher( getTermIndexReader() ).search( q,
> 100 );
> > for ( ScoreDoc hit : topDocs.scoreDocs ) {
> > Document doc = getTermIndexReader().document( hit.doc );
> > String indexTerm = doc.get( FIELDNAME_TERM );
> > if ( !returnValue.contains( indexTerm ) )
> > {
> > returnValue.add( indexTerm );
> > }
> > }
> > ----------
> > The TermAbnalyzer is the same analyzer as the main index analyzer with
> the exception that a LowerCaseFilter is applied.
>
> What is the Analyzer for the Main Index? What is the tokenizer and token
> filters used?
>
> Out of curiosity, what is the problem you are trying to solve?
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic