[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    Re: Highlighter API
From:       markharw00d <markharw00d () yahoo ! co ! uk>
Date:       2005-02-18 23:35:01
Message-ID: 42167BA5.3090204 () yahoo ! co ! uk
[Download RAW message or body]

> > the Highlighter's getBestFragment method takes a TokenStream and a text. 
> > Wouldn't it be easier to give it just the text and an analyzer

That's how it was originally coded. The move to TokenStream was a deliberate choice, \
made in order to decouple the highlighter from the source of tokens and enable \
alternatives. Re-analyzing document text with an Analyzer is one (potentially costly) \
way of getting Tokens. Another is to use the new TermVector support (see \
TokenSources.java in the highlighter package). In my apps I have query processing \
stages which use TokenStreams to extract themes from result sets and the output of \
TokenStreams produced in this stage can usefully be cached and reused in the \
highlighting stage. If ease of use is your concern I would suggest wrapping the \
highlighter functionality with a simpler (Analyzer based) interface rather than \
changing the internals of the highlighter implementation. That way more experienced \
users still have the option to use optimized alternatives in the underlying code.

Cheers,
Mark



Daniel Naber wrote:

> Hi,
> 
> the Highlighter's getBestFragment method takes a TokenStream and a text. 
> Wouldn't it be easier to give it just the text and an analyzer so the user 
> doesn't have to care about building a TokenStream? Like this:
> 
> public final String getBestFragment(Analyzer analyzer, String text)
> throws IOException
> {
> TokenStream tokenStream = analyzer.tokenStream("field", new 	
> StringReader(text));
> return getBestFragment(tokenStream, text);
> }
> 
> The old method could then be deprecated. Or am I missing something? This 
> would also avoid problems in case the stream doesn't match the text.
> 
> Regards
> Daniel
> 
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic