'[jira] [Updated] (LUCENE-5722) Speed up MMapDirectory.seek()'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [jira] [Updated] (LUCENE-5722) Speed up MMapDirectory.seek()
From:       "Robert Muir (JIRA)" <jira () apache ! org>
Date:       2014-05-31 6:27:01
Message-ID: JIRA.12717713.1401516916062.52805.1401517621649 () arcas
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/LUCENE-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Robert Muir updated LUCENE-5722:
--------------------------------

    Attachment: LUCENE-5722.patch

Patch: warning, its quite ugly but looks correct and seems to do well with tests (all \
pass).

I see a combined 45% improvement to docvalues performance with this patch and \
LUCENE-5720.

The hairy part: it comes from the fact that even if we have a big file (e.g. dv .dat \
today) with multiple buffers, slice() should be optimal in the case only one is \
needed to access that region. And I abstracted ByteBufferIndexInput and i guess i'm \
paying the cost now :(

On the other hand this opens up additional things to explore, e.g. maybe we should \
override readByte/Bytes since its much less code to inline in this case, and maybe we \
should investigate simply changing directpackedreaders to just require a slice over \
their data (e.g. getFilePointer == 0) to remove the addition there.

> Speed up MMapDirectory.seek()
> -----------------------------
> 
> Key: LUCENE-5722
> URL: https://issues.apache.org/jira/browse/LUCENE-5722
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-5722.patch
> 
> 
> For traditional lucene access which is mostly sequential, occasional advance(), I \
> think this method gets drowned out in noise. But for access like docvalues, its \
> important. Unfortunately seek() is complex today because of mapping multiple \
> buffers. However, the very common case is that only one map is used for a given \
> clone or slice. When there is the possibility to use only a single mapped buffer, \
> we should instead take advantage of ByteBuffer.slice(), which will adjust the \
> internal mmap address and remove the offset calculation. furthermore we don't need \
> the shift/mask or even the negative check, as they are then all handled with the \
> ByteBuffer api: seek is a one-liner (with try/catch of course to convert \
> exceptions). This makes docvalues access 20% faster, I havent tested conjunctions \
> or anyhting like that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic