[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: how to write an efficient query with a subquery to restrict the search space?
From:       svante karlsson <saka () csi ! se>
Date:       2014-01-31 12:10:46
Message-ID: CAHkETj_NQmp5CGqGTAiJaZHN1ZagyY6mjkZt+Y-b01iSnqzy6A () mail ! gmail ! com
[Download RAW message or body]


It seems to be faster to first restrict the search space and then do the
scoring compared to just use the full query and let solr handle everything.

For example in my application one of the scoring fields effectivly hits
1/12 of the database (a month field) and if we have 100'' items in the
database the this matters.

/svante


2014-01-30 Jack Krupansky <jack@basetechnology.com>:

> Lucene's default scoring should give you much of what you want - ranking
> hits of low-frequency terms higher - without any special query syntax -
> just list out your terms and use "OR" as your default operator.
>
> -- Jack Krupansky
>
> -----Original Message----- From: svante karlsson
> Sent: Thursday, January 23, 2014 6:42 AM
> To: solr-user@lucene.apache.org
> Subject: how to write an efficient query with a subquery to restrict the
> search space?
>
>
> I have a solr db containing 1 billion records that I'm trying to use in a
> NoSQL fashion.
>
> What I want to do is find the best matches using all search terms but
> restrict the search space to the most unique terms
>
> In this example I know that val2 and val4 is rare terms and val1 and val3
> are more common. In my real scenario I'll have 20 fields that I want to
> include or exclude in the inner query depending on the uniqueness of the
> requested value.
>
>
> my first approach was:
> q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
> OR field4:val4)&rows=100&fl=*
>
> but what I think I get is
> .....  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
> OR'ed with the rest
>
> if I write
> q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
> (field2:val2 OR field4:val4)&rows=100&fl=*
>
> then what I think I get is two sub-queries that is evaluated separately and
> then joined - performance wise this is bad.
>
> Whats the best way to write these types of queries?
>
>
> Are there any performance issues when running it on several solrcloud nodes
> vs a single instance or should it scale?
>
>
>
> /svante
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic