[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Lucene "cuts" the search results ?
From:       markharw00d <markharw00d () yahoo ! co ! uk>
Date:       2005-02-15 15:27:53
Message-ID: 421214F9.5030008 () yahoo ! co ! uk
[Download RAW message or body]

Hi Pierre,
Here's the response I gave the last time this question was raised::

The highlighter uses a number of "pluggable" services, one of which is the
choice of "Fragmenter" implementation. This interface is for classes which
decide the boundaries where to cut the original text into snippets. The 
default
implementation used simply breaks up text into evenly sized chunks. A more
intelligent implementation could be made to detect sentence boundaries.
What you are asking for requires that the Fragmenter would know where the
upcoming query matches are and decides on fragment boundaries with this in
mind. To have this foresight would require a preliminary pass over the
TokenStream to identify the match points before calling the highlighter.

This Fragmenter implementation does not exist but it does not sound
unachievable. I would suggest that some knowledge of sentence boundaries
probably would probably help here too. I dont have any plans to write such a
Fragmenter now but this is how it could be done.

Hope this helps,
Cheers,
Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic