[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: Re: Fast access to a random page of the search results.
From: "Stanislav Jordanov" <stenly () sirma ! bg>
Date: 2005-02-28 15:39:59
Message-ID: 019801c51dab$c2c746e0$d380a8c0 () sirma ! int
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
> What did you do in your private investigation?
1. empirical tests with an index of nearly 75,000 docs (I am attaching the test source)
2. reviewing and tracing the source code of Lucene
(I do not claim I have gained a deep understanding of it ;-)
> Sorted by descending relevance (the default), or in some other way?
In some other way - sorted by some column (asc or desc - doesn't matter)
> If a search is fast enough, as you report, then you can simply start
> your access to Hits at the appropriate spot. For the current systems
> I'm working on, this is the approach I've used - start iterating hits
> at (pageNumber - 1) * numberOfItemsPerPage.
>
> Is that approach insufficient?
I'm afraid this is not sufficient;
Either I am doing something wrong,
or it is not that simple:
following is a log from my test session;
It appears that IndexSearcher.search(...) finishes rather fast
compared to the time it takes to fetch the last document from the Hits object.
The log starts here:
pa
Found 74222 document(s) that matched query 'pa'
Sorting by "sfile_name"
query executed in 16ms
Last doc accessed in 375ms
us
Found 74222 document(s) that matched query 'us'
Sorting by "sfile_name"
query executed in 31ms
Last doc accessed in 219ms
1
Found 74222 document(s) that matched query '1'
Sorting by "sfile_name"
query executed in 15ms
Last doc accessed in 235ms
5
Found 74222 document(s) that matched query '5'
Sorting by "sfile_name"
query executed in 422ms
Last doc accessed in 219ms
6
Found 72759 document(s) that matched query '6'
Sorting by "sfile_name"
query executed in 344ms
Last doc accessed in 250ms
[Attachment #5 (text/html)]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.3790.218" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY>
<DIV><FONT face=Arial size=2>> What did you do in your private
investigation?<BR>1. empirical tests with an index of
nearly 75,000 docs (I am attaching the test source)</FONT></DIV>
<DIV><FONT face=Arial size=2>2. reviewing and tracing the source code of
Lucene</FONT></DIV>
<DIV><FONT face=Arial size=2>(I do not claim I have gained a deep understanding
of it ;-)</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>> Sorted by descending relevance (the default),
or in some other way?<BR>In some other way - sorted by some column (asc or desc
- doesn't matter)</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>> If a search is fast enough, as you report,
then you can simply start <BR>> your access to Hits at the appropriate
spot. For the current systems <BR>> I'm working on, this is the
approach I've used - start iterating hits <BR>> at (pageNumber - 1) *
numberOfItemsPerPage.<BR>> <BR>> Is that approach
insufficient?<BR></FONT></DIV>
<DIV><FONT face=Arial size=2>I'm afraid this is not sufficient;</FONT></DIV>
<DIV><FONT face=Arial size=2>Either I am doing something wrong,</FONT></DIV>
<DIV><FONT face=Arial size=2>or it is not that simple:</FONT></DIV>
<DIV><FONT face=Arial size=2>following is a log from my test
session;</FONT></DIV>
<DIV><FONT face=Arial size=2>It appears that IndexSearcher.search(...) finishes
rather fast</FONT></DIV>
<DIV><FONT face=Arial size=2>compared to the time it takes to fetch the last
document from the Hits object.</FONT></DIV>
<DIV><FONT face=Arial size=2>The log starts here:</FONT></DIV>
<DIV>
<P><FONT face=Arial size=2>pa</FONT></P>
<P><FONT face=Arial size=2>Found 74222 document(s) that matched query
'pa'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 16ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 375ms</FONT></P>
<P><FONT face=Arial size=2>us</FONT></P>
<P><FONT face=Arial size=2>Found 74222 document(s) that matched query
'us'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 31ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 219ms</FONT></P>
<P><FONT face=Arial size=2>1</FONT></P>
<P><FONT face=Arial size=2>Found 74222 document(s) that matched query
'1'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 15ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 235ms</FONT></P>
<P><FONT face=Arial size=2>5</FONT></P>
<P><FONT face=Arial size=2>Found 74222 document(s) that matched query
'5'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 422ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in 219ms</FONT></P>
<P><FONT face=Arial size=2>6</FONT></P>
<P><FONT face=Arial size=2>Found 72759 document(s) that matched query
'6'</FONT></P>
<P><FONT face=Arial size=2>Sorting by "sfile_name"</FONT></P>
<P><FONT face=Arial size=2>query executed in 344ms</FONT></P>
<P><FONT face=Arial size=2>Last doc accessed in
250ms</FONT></P></DIV></BODY></HTML>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic