[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Fast access to a random page of the search results.
From:       "Stanislav Jordanov" <stenly () sirma ! bg>
Date:       2005-02-28 15:39:59
Message-ID: 019801c51dab$c2c746e0$d380a8c0 () sirma ! int
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


> What did you do in your private investigation?
1. empirical tests with an index of nearly 75,000 docs (I am attaching the test source)
2. reviewing and tracing the source code of Lucene
(I do not claim I have gained a deep understanding of it ;-)

> Sorted by descending relevance (the default), or in some other way?
In some other way - sorted by some column (asc or desc - doesn't matter)

> If a search is fast enough, as you report, then you can simply start 
> your access to Hits at the appropriate spot.  For the current systems 
> I'm working on, this is the approach I've used - start iterating hits 
> at (pageNumber - 1) * numberOfItemsPerPage.
> 
> Is that approach insufficient?

I'm afraid this is not sufficient;
Either I am doing something wrong,
or it is not that simple:
following is a log from my test session;
It appears that IndexSearcher.search(...) finishes rather fast
compared to the time it takes to fetch the last document from the Hits object.
The log starts here:
pa

Found 74222 document(s) that matched query 'pa'

Sorting by "sfile_name"

query executed in 16ms

Last doc accessed in 375ms

us

Found 74222 document(s) that matched query 'us'

Sorting by "sfile_name"

query executed in 31ms

Last doc accessed in 219ms

1

Found 74222 document(s) that matched query '1'

Sorting by "sfile_name"

query executed in 15ms

Last doc accessed in 235ms

5

Found 74222 document(s) that matched query '5'

Sorting by "sfile_name"

query executed in 422ms

Last doc accessed in 219ms

6

Found 72759 document(s) that matched query '6'

Sorting by "sfile_name"

query executed in 344ms

Last doc accessed in 250ms

[Attachment #5 (text/html)]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.3790.218" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY>
<DIV><FONT face=Arial size=2>&gt; What did you do in your private 
investigation?<BR>1.&nbsp;empirical tests&nbsp;with an index of 
nearly&nbsp;75,000 docs (I am attaching the test source)</FONT></DIV>
<DIV><FONT face=Arial size=2>2. reviewing and tracing the source code of 
Lucene</FONT></DIV>
<DIV><FONT face=Arial size=2>(I do not claim I have gained a deep understanding 
of it ;-)</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>&gt; Sorted by descending relevance (the default), 
or in some other way?<BR>In some other way - sorted by some column (asc or desc 
- doesn't matter)</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>&gt; If a search is fast enough, as you report, 
then you can simply start <BR>&gt; your access to Hits at the appropriate 
spot.&nbsp; For the current systems <BR>&gt; I'm working on, this is the 
approach I've used - start iterating hits <BR>&gt; at (pageNumber - 1) * 
numberOfItemsPerPage.<BR>&gt; <BR>&gt; Is that approach 
insufficient?<BR></FONT></DIV>
<DIV><FONT face=Arial size=2>I'm afraid this is not sufficient;</FONT></DIV>
<DIV><FONT face=Arial size=2>Either I am doing something wrong,</FONT></DIV>
<DIV><FONT face=Arial size=2>or it is not that simple:</FONT></DIV>
<DIV><FONT face=Arial size=2>following is a log from&nbsp;my test 
session;</FONT></DIV>
<DIV><FONT face=Arial size=2>It appears that IndexSearcher.search(...) finishes 
rather fast</FONT></DIV>
<DIV><FONT face=Arial size=2>compared to the time it takes to fetch the last 
document from the Hits object.</FONT></DIV>
<DIV><FONT face=Arial size=2>The log starts here:</FONT></DIV>
<DIV>
<P><FONT face=Arial size=2>pa</FONT></P>
<P><FONT face=Arial size=2>Found 74222 document(s) that matched query 
'pa'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 16ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 375ms</FONT></P>
<P><FONT face=Arial size=2>us</FONT></P>
<P><FONT face=Arial size=2>Found 74222 document(s) that matched query 
'us'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 31ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 219ms</FONT></P>
<P><FONT face=Arial size=2>1</FONT></P>
<P><FONT face=Arial size=2>Found 74222 document(s) that matched query 
'1'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 15ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 235ms</FONT></P>
<P><FONT face=Arial size=2>5</FONT></P>
<P><FONT face=Arial size=2>Found 74222 document(s) that matched query 
'5'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 422ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 219ms</FONT></P>
<P><FONT face=Arial size=2>6</FONT></P>
<P><FONT face=Arial size=2>Found 72759 document(s) that matched query 
'6'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 344ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 
250ms</FONT></P></DIV></BODY></HTML>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic