[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: unlimited wildcard term expansion
From: John Z <zjavier_1 () yahoo ! com>
Date: 2004-06-30 17:50:43
Message-ID: 20040630175043.76285.qmail () web51804 ! mail ! yahoo ! com
[Download RAW message or body]
Hi,
I am trying to find a way to handle the wildcard queries in Lucene without going out \
of memory and have been having some problems with it.
I have modified some parts in search part of Lucene to just keep only about 1000 \
terms in memory and write the rest of the terms to a file (this is done in the \
getQuery() method of MultiTermQuery.java, PrefixQuery.java, etc.).
Then when we create scorer objects and collect scores for each clause in the score() \
method of the BooleanScorer.java, after all the clauses (that are in memory) are \
processed, then I continue reading from the file that I created earlier. I read out \
each term from the file and create a TermQuery, then get the scorer object from this \
TermQuery and collect the score for it.
Then the bucketTable will do collectHits of everything.
I have tested out my changes with small indexes with about 2 terms in memory and \
about 2 or 3 terms in the file, and it worked fine.
However, when I tried this out with bigger indexes (> 1 million docs) and with 1000 \
in memory and 972 in the file, I got into an infinite loop when doing \
bucketTable.collectHits(). I printed out the doc in each bucket and noticed that \
about half way through the bucket list, it started to have about 4 - 5 repeated docs \
in the rest of the list and there was no null at the end of the list to end it.
I have looked at everywhere and even tried to increase the bucket table size to be \
the sum of the number of terms in memory and number of terms in the file. But that \
still did not work.
I would really appreciate any suggestions/ideas/help on this.
Thanks.
Javier
---------------------------------
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic