[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    RE: About search performance
From:       "Russell M. Allen" <Russell.Allen () aebn ! net>
Date:       2006-07-31 13:24:11
Message-ID: E9BF7E0FD95D1143B399B129908A58FF01582DB8 () datachange01 ! dataconversions ! biz
[Download RAW message or body]

You should build your own performance test cases to see what works for your data.  \
That being said, here are some numbers from a similar test I ran:

I did the following:
1) run a single term query which resulted in about half of the total set of documents \
being returned.  (~36,000) 2) built a Boolean query (tree of them actually) with a \
Boolean clause for each of the 36,000 documents. 3) run this new secondary query to \
get 'final' results.

The time it took to run each step was:
1) 31 ms
2) 1610 ms
3) 1047 ms

Without the giant Boolean query attached to the final query, stem three takes about \
15 ms.  So... Yes, a large query structure will increase query execution time.  In \
researching how to reduce this, I found filters to be the fastest.  Using filters:

Steps required:
1) run the initial single term query.
2) Build a query object model based on results from step 1.
3) Build bitset filter using query from step 2.  CACHE THE RESULTS!
4) Execute main query, with cached bitset filter.

Execution time:
1) 16 ms
2) 1609 ms
3) 969 ms
4) 15 ms

As you can see, the total time is longer, but, in our specific case, the initial \
query rarely changes.  Thus, by caching the filter results, the vast majority of \
queries are done in 15 ms.  This only works because we are not updating our indexes \
that often, and the initial query is fairly static too.

This is a classic trade.  I am trading space for time.  Each cached filter costs \
around 2k (reflection of index size) of memory to keep around.  In turn, that saves \
me a little over a second from the final query.

I hope these number help, and perhaps spark an idea.

-Russell

-----Original Message-----
From: zhongyi yuan [mailto:yzy1980@gmail.com] 
Sent: Friday, July 28, 2006 9:47 PM
To: java-user@lucene.apache.org
Subject: About search performance

Hi,How about implement multi-key search use lucene, for example use boolean search \
exceed 1000 clauses,it will affect the performance greatly. If use filter or custom \
sorter to select the result, because the result is extremely large in amount,so the \
performance is lower. Please give me some advice,Thanks.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic