'Re: [Nutch-dev] Search Performance'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-developers
Subject:    Re: [Nutch-dev] Search Performance
From:       Daniel Naber <daniel.naber () t-online ! de>
Date:       2004-12-19 11:56:21
Message-ID: 200412191256.21898 () danielnaber ! de
[Download RAW message or body]

On Friday 17 December 2004 17:05, Luke Baker wrote:

> (3) Given an index of 2 million URLs and those major index structures in
> RAM what is limiting us to only 20 queries/second?

I can only speak for Lucene, not sure if everything also applies to Nutch. 
There are two things that might be "slow":

First, the term lookup itself scales very well, but the searches takes 
longer if there are many matches. That's because the documents are not 
ordered by their score in the index, so Lucene needs to look at every 
document id that is attached to a term (this implementation is very fast, 
but it's not O(1)).

Second, for every match that is displayed the document needs to be fetched 
to display its title etc. If you only keep .tis, .frq, .prx in RAM then 
one disk access is needed for every document that is displayed.

Regards
 Daniel

-- 
http://www.danielnaber.de


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic