[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Search Results Clustering
From:       Ray Tsang <saturnism () gmail ! com>
Date:       2005-08-31 5:10:36
Message-ID: fb9748c905083022103e411ffe () mail ! gmail ! com
[Download RAW message or body]

I had similar requirements of "count" and "group by" on over 130milrecords, it's \
really a pain.  It's currently usable but notsatisfactory. Currently it's grouping at \
run-time by iterating through ungroupeditems.  It collects matching documents into \
BitSet, so subsequentqueries can use BitSet to retrieve the results of original \
query. Moreover, it can mark off documents that are already being groupedfrom the \
BitSet. In a page that shows 10 records/page, it will only group 10 records ata time. \
Consequently, there is no way to know the total number groupedrecords in the \
beginning. In addition, it feels like reading the field values from the documentin \
order to look for group-by results is most time consuming. How does RDBMS do it?
ray,
On 8/31/05, kapilChhabra (sent by Nabble.com) <lists@nabble.com> wrote:> > thanks a \
lot for your suggestion.> I'll try it and get back if need be.> > Meanwhile, I gave \
it a thought and concluded that the best time to do the categorization/clustering \
should be lucene calculates Hits/in the Scrorer.> I am not sure if I am right.> In \
addition to the current functionality can we modify the Scorer class add the \
following feature:> The class generates a 2 dimentional array for the clustered \
field, the first dimention contains the distinct values of the field and the second \
dimention contains the count of results under this field. This value is incremented \
for an acceptible hit.> Does it make sense?> If it is possible, i'll dig deeper into \
the code of the Hits/Scorer classes.> > Thanks in advance,> kapilChhabra> > > --> \
Sent from the Lucene - Java Users forum at Nabble.com:> \
http://www.nabble.com/Search-Results-Clustering-t249355.html#a748901> >


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic