[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: Fuzzy searching, tildes and solr
From:       "Yonik Seeley" <yonik () apache ! org>
Date:       2007-01-26 15:58:10
Message-ID: c68e39170701260758g31586182qf0a9f8e896a0b65a () mail ! gmail ! com
[Download RAW message or body]

On 1/26/07, Walter Lewis <lewisw@hhpl.on.ca> wrote:
> Yonik Seeley wrote:
> > +(+text:jame +text:sutherland) +searchSet:testSet
> >> +(+text:james~0.75 +text:sutherland~0.75) +searchSet:testSet
> >
> > I can tell from the first that this is a stemmed field... "james" is
> > transformed to "jame"
> "James" being the plural of "Jame" according to the stemmer.  I guess my
> mind hadn't run in that direction. :)
>
> I guess I wasn't expecting the fuzzy query logic to bypass the
> stemming.

I would expect there to be at least as many problems trying to do
stemming on partial or misspelled words.

For a simpler example, consider prefix queries...
If you tried titie:a* or title:an* to find titles including anaconda,
and you did full "analysis" of the terms first, they would be removed
as stop words and you would find nothing.

>  Would it be correct that if I were to add "james" to the
> protwords.txt file that this *specific* problem would go away?

Yes, It should.

> Obviously
> there are a significant quantity of proper names where this would have
> an impact, so a more generic solution is preferable.
> > So, you could
> > - index the field twice using copyField, and then do fuzzy queries on
> > the non-stemmed version. [plus two other good suggestions]
> As I look at the field types in the example schema would you recommend
> something like text_lu without the EnglishPorterFilterFactory, or are
> there other issues I'm overlooking.

text_lu also has stemming.

The text field types are examples, and you should be customizing your own.
It depends on how you want to "normalize" text.

You could start make a new field type by starting with your current
text type and removing the stemmer.

-Yonik
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic