[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-developers
Subject:    Re: [Nutch-dev] Re: contribution to nutch development - alternative
From:       Andrzej Bialecki <ab () getopt ! org>
Date:       2004-11-19 17:15:39
Message-ID: 419E2A3B.4010801 () getopt ! org
[Download RAW message or body]

Michael Nebel wrote:
> Hi Andrzej,
> 
> I just tried your PruneIndexTool and I'm a little bit confused:

Hi there,

I will be committing a newer version of the tool soon (tomorrow or on 
Monday).

> 
> - using the webfrontend I have a query "wordA wordB wordC" which returns
>   2 results with different URLs.

A very important thing is that PruneIndexTool uses a DIFFERENT syntax 
for queries than the Nutch web frontend. The syntax for the tool is 
Lucene QueryParser syntax - please see the javadoc comments for an example.

> 
> - The I tried to remove the pages using PruneIndex:
>     content: "wordA wordB wordC"

First of all, there must be no space between the field name, colon, and 
the query term. I assume it's just a transcription error, and not the 
real query...

Anyway, this query means that you want to match all documents, which 
contain "wordA wordB wordC" as an exact phrase in the content field. 
Probably not what you wanted... you probably wanted something like:

content:(wordA OR wordB OR wordC)

Am I right?

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic