[prev in list] [next in list] [prev in thread] [next in thread]
List: solr-user
Subject: Re: Fuzzy searching, tildes and solr
From: "Yonik Seeley" <yonik () apache ! org>
Date: 2007-01-26 15:58:10
Message-ID: c68e39170701260758g31586182qf0a9f8e896a0b65a () mail ! gmail ! com
[Download RAW message or body]
On 1/26/07, Walter Lewis <lewisw@hhpl.on.ca> wrote:
> Yonik Seeley wrote:
> > +(+text:jame +text:sutherland) +searchSet:testSet
> >> +(+text:james~0.75 +text:sutherland~0.75) +searchSet:testSet
> >
> > I can tell from the first that this is a stemmed field... "james" is
> > transformed to "jame"
> "James" being the plural of "Jame" according to the stemmer. I guess my
> mind hadn't run in that direction. :)
>
> I guess I wasn't expecting the fuzzy query logic to bypass the
> stemming.
I would expect there to be at least as many problems trying to do
stemming on partial or misspelled words.
For a simpler example, consider prefix queries...
If you tried titie:a* or title:an* to find titles including anaconda,
and you did full "analysis" of the terms first, they would be removed
as stop words and you would find nothing.
> Would it be correct that if I were to add "james" to the
> protwords.txt file that this *specific* problem would go away?
Yes, It should.
> Obviously
> there are a significant quantity of proper names where this would have
> an impact, so a more generic solution is preferable.
> > So, you could
> > - index the field twice using copyField, and then do fuzzy queries on
> > the non-stemmed version. [plus two other good suggestions]
> As I look at the field types in the example schema would you recommend
> something like text_lu without the EnglishPorterFilterFactory, or are
> there other issues I'm overlooking.
text_lu also has stemming.
The text field types are examples, and you should be customizing your own.
It depends on how you want to "normalize" text.
You could start make a new field type by starting with your current
text type and removing the stemmer.
-Yonik
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic