[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: Indexing problem
From: Falko Guderian <mail.to.falko () gmx ! de>
Date: 2005-05-30 18:23:40
Message-ID: 429B5A2C.5020903 () gmx ! de
[Download RAW message or body]
Hi,
I indexed 20 documents. I want to evaluate my lucene index. That's why I
extract all term with their frequencies in each document.
This code has helped a lot.
-------------------------------------------------------------
try
{
TermEnum terms = indexReader.terms(new Term("content", ""));
while ("content".equals(terms.term().field()))
{
TermDocs termDocs = indexReader.termDocs();
termDocs.seek(terms);
// ... collect term.term().text() ...
int frequency = 0;
for(int i = 0; i< indexWriter.numDocs(); i++) {
...
freqency = termDocs.freq();
...
termDocs.next();
}
if (!terms.next())
break;
}
}
finally
{
terms.close();
}
-------------------------------------------------------------
But there is an anomaly. In the first document(termDocs.doc() = 0) all
term frequencies are greater than 0.
But it isn't correct. The first doc doesn't contain all terms.
Do you now this problem? How can I get the correct term frequencies in
all docs?
Best regards
Falko Guderian
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic