'RE: Reusing Query instances'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    RE: Reusing Query instances
From:       "Uwe Schindler" <uwe () thetaphi ! de>
Date:       2011-04-30 7:45:24
Message-ID: 009701cc070a$912bdca0$b38395e0$ () thetaphi ! de
[Download RAW message or body]

Hi Otis,

> Is there any reason why one would *not* want to reuse Query instances?

Definitely not!
 
> I'm using MemoryIndex with a fixed set of queries and I'm executing them
all
> on each new document that comes in.  Because each document needs to
> have many tens of thousands of queries executed against it, I thought I'd
just
> run all queries through QueryParser once at the beginning, and then just
> reuse Query instances on each incoming document.  What I've noticed is
that
> my fixed set of queries takes longer and longer to execute as time passes
> (more and more time is spent inside memoryIndex.search(....) somewhere).
> The problem is not heap/memory - there is no crazy GCing and the heap is
> not full, but the CPU is 100% busy.

You should still generate some dumps when its gets slow.

In general, reusing queries is perfectly fine, as the queries itself are
only a hull for the query parameters and factories for new rewritten queries
(if needed) and factories for Weights/Scorers. Of course, you should not
reuse rewritten queries, as they largely depend on the underlying index
(which changes on each request).

> I should note that queries I'm dealing with are ugly and big, using lots
of
> wildcards, but trailing and prefix ones (and this is Lucene 3.1, so no
faster
> Wildcard impl).
> I should also emphasize that at this point I only *suspect* that maaaybe
the
> gradual slowdown I'm seeing has something to do with the fact that I'm
> reusing Query instances.

Did this somehow change with 3.1 or was this the same in 3.0? In fact for
each query execution, a BitSet is allocated per segment, but as you use
MemoryIndex, the BitSet is one slot *g* (so its not an issue). For memory
index, it's more important that the term dictionary / positions is optimized
so PhraseQueries and Wildcard queries can quickly execute on the term index.
As said before, the queries from query parser are only used to rewrite
against, producing index, specific queries. The reuse pattern is ok and
wanted.

Some other question: Can you temporary replace memoryindex by another simple
one-doc impl (RAMDirectory), just to test if it also slows down then? I
don't like MemoryIndex at all (I know, it was not the bad guy for your stack
overflow).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic