'Re: Getting all values for a specific dimension for SortedSetDocValues per document'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Getting all values for a specific dimension for SortedSetDocValues per document
From:       Greg Miller <gsmiller () gmail ! com>
Date:       2022-06-30 22:46:44
Message-ID: CANJ0CDp3uOD6e-r5neFNFomE1aVNk2r0+5RH_wdsXy_whCvHBw () mail ! gmail ! com
[Download RAW message or body]

Hi Harry-

Have you considered taxonomy faceting for your use-case? Because the
taxonomy structure is maintained in a separate index, it's
(relatively) trivial to iterate all direct child ordinals of a given
dimension. The cost of mapping to a global ordinal space is done when
the index is merged.

Separately, I'd be curious about where you're running into performance
issues within the context of your system. Is the cost you're concerned
with building up the ordinal map? That's certainly expensive, but it's
a one-time cost (until you refresh your index). Or are you concerned
with the actual map lookup within your tight loop? If the latter, you
could consider doing more work at the slice-level by separately
determining the child ords for each dim ord within the context of each
segment (there's no off-the-shelf code for this that I'm aware of, so
you'd have to roll your own).

Cheers,
-Greg

On Thu, Jun 30, 2022 at 11:52 AM Harald Braumann <braumann@m2n.at> wrote:
>
> Hi!
>
> I'm looking for a solution for the following problem:
>
> I would like to get all the values for a specific dimension for
> SortedSetDocValues per document. I've basically copied
> SortedSetDocValuesFacetCounts, but instead of just counting, I build a
> map from doc to values. The problem here is, that I have to iterate
> through all ords and map them to global ords to check if they belong to
> the desired dimension. And this is very inefficient. Is there a better
> way, to iterate through documents and get all the values for a specific
> dimension?
>
> Here is a simplified code of what I'm doing now:
>
> SortedSetDocValuesReaderState state;
>
> LongValues segOrdMap = ordinalMap.getGlobalOrds(segOrd);
> SortedSetDocValues it = DocValues.getSortedSet(reader, field);
>
> OrdRange dimOrdRange = state.getOrdRange(dim);
>
> for (int doc = it.nextDoc();...)
>    for (long term = it.nextOrd(); ...)
>      long globalTerm = segOrdMap.get(term) : term;
>      if (globalTerm >= dimOrdRange.start &&
>          globalTerm <= dimOrdRange end) {
>        // add doc/globalTerm
>      }
>    }
> }
>
> Am I on the completely wrong path here?
>
> Thanks in advance and regards
> harry
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic