'TermInfosReader optimisation?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    TermInfosReader optimisation?
From:       Tony Bowden <tony-lucene () kasei ! com>
Date:       2004-03-31 9:07:12
Message-ID: 20040331090712.GA4699 () soto ! kasei ! com
[Download RAW message or body]


An interesting thing has come up with Plucene:

The code for TermInfosReader.get has an optimisation so that in
sequential access it doesn't need to keep seeking:

  final synchronized TermInfo get(Term term) throws IOException {
    if (size == 0) return null;

    // optimize sequential access: first try scanning cached enum w/o seeking
    if (enum.term() != null                          // term is at or past current
        && ((enum.prev != null && term.compareTo(enum.prev) > 0)
            || term.compareTo(enum.term()) >= 0)) {
      int enumOffset = (enum.position/TermInfosWriter.INDEX_INTERVAL)+1;
      if (indexTerms.length == enumOffset          // but before end of block
          || term.compareTo(indexTerms[enumOffset]) < 0)
        return scanEnum(term);                          // no need to seek
    }

    // random-access: must seek
    seekEnum(getIndexOffset(term));
    return scanEnum(term);
  }

In the Perl version, this whole middle section slows everything down
considerably (by almost 50%). I'm not sure whether this is because of
bottlenecks being at different places in Perl vs Java, but I'm curious
as what impact this optimisation has in the Java.

I can't easily test it from here at the minute, but I'm curious if
there are any Benchmarks on the effect of having that optimisation vs
not having it.

Thanks,

Tony



Tony


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic