'[jira] Commented: (LUCENE-1195) Performance improvement for'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [jira] Commented: (LUCENE-1195) Performance improvement for
From:       "Yonik Seeley (JIRA)" <jira () apache ! org>
Date:       2008-02-27 21:27:51
Message-ID: 626993443.1204147671540.JavaMail.jira () brutus
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin \
.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573073#action_12573073 ] 

Yonik Seeley commented on LUCENE-1195:
--------------------------------------

There's higher level synchronization too (ensuring that two different threads don't \
generate the same cache entry at the same time), and I agree that should not be done \
in this case.

Just use Collections.synchronizedMap(), it will be the same speed, more readable, and \
can be easily replaced later anyway.

> Performance improvement for TermInfosReader
> -------------------------------------------
> 
> Key: LUCENE-1195
> URL: https://issues.apache.org/jira/browse/LUCENE-1195
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.4
> 
> Attachments: lucene-1195.patch
> 
> 
> Currently we have a bottleneck for multi-term queries: the dictionary lookup is \
> being done twice for each term. The first time in Similarity.idf(), where \
> searcher.docFreq() is called. The second time when the posting list is opened \
> (TermDocs or TermPositions). The dictionary lookup is not cheap, that's why a \
> significant performance improvement is possible here if we avoid the second lookup. \
> An easy way to do this is to add a small LRU  cache to TermInfosReader. 
> I ran some performance experiments with an LRU cache size of 20, and an mid-size \
> index of 500,000 documents from wikipedia. Here are some test results:
> 50,000 AND queries with 3 terms each:
> old:                  152 secs
> new (with LRU cache): 112 secs (26% faster)
> 50,000 OR queries with 3 terms each:
> old:                  175 secs
> new (with LRU cache): 133 secs (24% faster)
> For bigger indexes this patch will probably have less impact, for smaller once \
> more. I will attach a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic