[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    RE: Efficient way to define large Boolean Occur.FILTER clause in Lucene 6
From:       "Hasenberger, Josef" <Josef.Hasenberger () zetcom ! com>
Date:       2018-07-09 5:46:01
Message-ID: 5fc8871ccf6b4fbcb27821cb0cf85412 () SYSRV214 ! exdom01 ! lan
[Download RAW message or body]

Hi,

Great, thanks a lot. 

Pointing out to RandomAccessWeight and the approach used in DocValuesNumbersQuery is \
exactly what I need for my use case. I created my own query type that takes advantage \
of already loaded LongBitSet values. It allows efficiently implementing the Bits that \
match a document inside my own RandomAccessWeight implementation.

This approach is efficient when number of values exceeds a certain threshold. Below \
that threshold, using TermsQuery is more efficient.  I can decide in my code which \
approach is actually more efficient by applying my specific heuristic.

Overall, for larger values map (above 20,000 entries), I decreased search time to \
about 10-30% of what I needed before. For smaller value maps, search time stay \
efficient due to usage of TermsQuery.

Thanks again!

Josef

-----Original Message-----
From: Trejkaz [mailto:trejkaz@trypticon.org] 
Sent: Wednesday, June 27, 2018 4:51 AM
To: Lucene Users Mailing List
Subject: Re: Efficient way to define large Boolean Occur.FILTER clause in Lucene 6

On Tue, Jun 26, 2018 at 7:02 PM, Hasenberger, Josef
<Josef.Hasenberger@zetcom.com> wrote:
> However, I have a feeling that the conversion from Long values to Terms is
> rather inefficient for large collections and also uses a lot of memory.
> To ease conversion overhead somewhat, I created a class that converts a
> Long value directly to BytesRef instance (in order to avoid conversion to
> UTF16 and then UTF8 again) and pass that instance to the Term constructor.

First thought is, why are you using TermsQuery if they're in DocValues?
Is DocValuesTermsQuery any better? It does depend on how many terms
you're searching for.

Second thought is that there is also DocValuesNumbersQuery, which
avoids having to convert all the values.

> I just wonder if there is a better method for passing large amount of filter \
> criteria to a BooleanQuery Occur.FILTER clause, that avoids excessive object \
> creation.

If you can get your long values into something which implements Bits,
you could make a query using RandomAccessWeight to directly point at
the existing set you already have in memory.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic