[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    Re: Question about max clause counts
From:       Adrien Grand <jpountz () gmail ! com>
Date:       2022-01-14 13:56:34
Message-ID: CAPsWd+MZhTHzUS7u-k--t4L0Vb=MyjMC4VqdTx553dNBv8XSLw () mail ! gmail ! com
[Download RAW message or body]

TermsInSetQuery has a completely different execution model that will
consume the postings of the sub queries in a bitset instead of merging
them on the fly using a heap. It may be faster or slower depending on
the case, the main benefit is that it has bounded memory usage (though
this bounded memory usage is high because of the bitset) and performs
sequential I/O so that it will not hammer your I/O to run the query.

On Fri, Jan 14, 2022 at 12:23 AM Petko Minkov <pminkov@gmail.com> wrote:
> 
> Thanks for explaining - that makes sense. I see that one of the recommended \
> approaches for large queries is to use TermInSetQuery. I don't find this in its \
> docs, but what are its benefits - is it faster, or does it take less memory? 
> --Petko
> 
> On Thu, Jan 13, 2022 at 1:10 AM Adrien Grand <jpountz@gmail.com> wrote:
> > 
> > Hi Petko,
> > 
> > We have been designing queries and the whole framework for query
> > execution with the assumption in mind that queries would be
> > reasonable, so it's hard to tell exactly what would break, but I think
> > it's expected that queries wouldn't execute in the most efficient way,
> > CPU-wise, memory-wise and disk-wise. So you would expose your
> > application to slow queries that might hammer your disk and/or cause
> > memory pressure if not out-of-memory errors.
> > 
> > On Wed, Jan 12, 2022 at 8:49 PM Petko Minkov <pminkov@gmail.com> wrote:
> > > 
> > > Hello,
> > > 
> > > I have a question about Lucene's max clause counts limit, exposed in \
> > > BooleanQuery::setMaxClauseCount (and now IndexSearcher). 
> > > The recommendation seems to be that these limits shouldn't be modified, but \
> > > instead more efficient queries should be constructed. Let's say the limits are \
> > > bumped to int max or some very high number -- I'm wondering what the effects of \
> > > this would be. Would the execution of smaller queries be affected? Would larger \
> > > queries execute as efficiently as possible? Or would some things start to break \
> > > somewhere? 
> > > --Petko
> > 
> > 
> > 
> > --
> > Adrien
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> > 


-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic