[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kinosearch
Subject:    [KinoSearch] multiple analyzers
From:       marvin () rectangular ! com (Marvin Humphrey)
Date:       2006-11-14 16:51:57
Message-ID: 902C4F99-C233-4902-88CE-FFE076B2298F () rectangular ! com
[Download RAW message or body]


On Nov 14, 2006, at 8:33 AM, Peter Sinnott wrote:

> The KinoSearch::Searcher docs say
>
> "analyzer - An object which subclasses KinoSearch::Analysis::Analyer,
> such as a PolyAnalyzer. This must be identical to the Analyzer used at
> index-time, or the results won't match up."
>
> Does this mean if I use different analyzers for different fields
> when creating the index then I can not search it properly?

Depends.  You definitely *can* search it properly, but you may need  
to get sophisticated about how you build your queries.

I'll give an example that doesn't use Analyzers, but illustrates the  
principle.

    my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new(
         language => 'en' );

    my $invindexer = KinoSearch::InvIndexer->new(
        analyzer => $polyanalyzer,
        invindex => '/path/to/invindex',
    );

    $invindexer->spec_field( name => 'body' );
    $invindexer->spec_field(
       name     => 'category'
       analyzed => 0,
    );

Now, say we add a document with the category of 'books'.  Because the  
category field doesn't get analyzed, the string 'books' makes it  
intact into the index.  However, if the word 'books' ever appears  
anywhere in the body, it will get stemmed down to 'book' by the  
PolyAnalyzer.

Because the following search will make use of the english  
PolyAnalyzer, it will return only matches on 'book' -- NOT 'books'...

    my $searcher = KinoSearch::Searcher->new(
        analyzer => $polyanalyzer,
        invindex => '/path/to/invindex',
    );
    my $hits = $searcher->search('books');

... so it will never match a document where the category is 'books'.

However, there are a number of ways to construct your query so that  
you match the category 'books'.  Here's one:

    my $category_query_parser = KinoSearch::QueryParser::QueryParser- 
 >new(
        analyzer => KinoSearch::Analysis::Analyzer->new, # no-op
        fields   => [ 'category' ],
    );
    my $main_query_parser = KinoSearch::QueryParser::QueryParser->new(
        analyzer => $poly_analyzer,
        fields   => [ 'body' ],
    );

    my $bool_query = KinoSearch::Search::BooleanQuery->new;

    # search category field for the unstemmed 'books'
    my $cat_query = $category_query_parser->parse('books');
    $bool_query->add_clause( query => $cat_query, occur => 'SHOULD' );

    # search body field for the stemmed 'book'
    my $main_query = $main_query_parser->parse('books');
    $bool_query->add_clause( query => $main_query, occur => 'SHOULD' );

    my $hits = $searcher->search( query => $bool_query );

Snoop the _prepare_simple_search() method in KinoSearch::Searcher to  
see what KS is doing behind the scenes to build a query when you  
supply only a query string.

HTH,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic