[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: Problem with PatternReplaceCharFilter
From:       jasimop <stricker.ma () gmail ! com>
Date:       2013-05-31 9:27:59
Message-ID: 1369992479368-4067265.post () n3 ! nabble ! com
[Download RAW message or body]

Thanks again for your input.

In fact I already preprocess the data (concatenation of only the content
values) and index it into another field.

But my general problem is the following: My data has such a cryptic format
and I have to search only within the content values. Therefore I preprocess
it and put it into a field. There all works fine (highlighting etc.).
The problem now comes from the fact that when getting a hit in that field I
need to know the <TextLine>
it appeared in to get the attribute values. They define some rules for
processing the search result, but it should not be possible to search in
them. Therefore I cannot just use the HtmlStripCharFilter.

So my idea was the following: indexing my cleaned version and the raw format
and make sure that both fields
generate the same tokens (this is the hard part). If i need to know the
surrounding attribute values i search
in the raw version and highlight the matching term. This is the indication
for me which attribute values to use.

Another option would be to search in the cleaned version and after the
search/in my application try to match that position to the one in the raw
format based on the highlighted term. But this is very error prone.

Both solutions do not seem elegant to me.


Any suggestions?




--
View this message in context: \
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869p4067265.html
 Sent from the Solr - User mailing list archive at Nabble.com.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic