[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-developers
Subject:    [Nutch-dev] [ nutch-Bugs-992437 ] add a couple of opitons to WebDBInjector
From:       "SourceForge.net" <noreply () sourceforge ! net>
Date:       2004-07-20 4:06:41
Message-ID: E1Bmltp-00032P-00 () sc8-sf-web3 ! sourceforge ! net
[Download RAW message or body]

Bugs item #992437, was opened at 2004-07-16 09:45
Message generated for change (Comment added) made by cutting
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=992437&group_id=59548

Category: tools
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Jungshik Shin (jshin)
Assigned to: Nobody/Anonymous (nobody)
Summary: add a couple of opitons to WebDBInjector

Initial Comment:
Here's my patch to add '-topic <dmoz topic>' and
'-topicsFile <filename with dmoz topics>' to
WebDBInjector.  They can be used together and the
former can be used multiple times.  This is handy if
you want to draw seeds from a certain set of topics. 

It uses java.util.regex (only available in JDK 1.4.2.
before). 





----------------------------------------------------------------------

>Comment By: Doug Cutting (cutting)
Date: 2004-07-19 21:06

Message:
Logged In: YES 
user_id=21778

Okay, I committed this.

I changed a System.out.println to a LOG.info, and also fixed
bin/nutch to permit spaces in command line options.

Thanks!

----------------------------------------------------------------------

Comment By: Jungshik Shin (jshin)
Date: 2004-07-19 14:03

Message:
Logged In: YES 
user_id=307557

Thank you for trying my patch. There was a very stupid
mistake in line 165 ( '!topicPattern.equals(null)' ). I
wonder what I was thinking when I wrote that :-)
This patch is the same as before, but now it properly checks
for null. 

----------------------------------------------------------------------

Comment By: Doug Cutting (cutting)
Date: 2004-07-19 09:52

Message:
Logged In: YES 
user_id=21778

I like this patch, but I tried applying it, and the injector
no longer works for me, but I cannot figure out why.  When I
revert, everything works again.

Here's the problem I get:

% bin/nutch inject db -dmozfile content.rdf.u8
040719 094333 loading
file:/home/cutting/src/nutch/mainline/conf/nutch-default.xml
040719 094334 loading
file:/home/cutting/src/nutch/mainline/conf/nutch-site.xml
040719 094334 skew = 1998600953
040719 094334 Begin parse
040719 094334 SEVERE java.lang.NullPointerException
java.lang.NullPointerException
        at
net.nutch.db.WebDBInjector$RDFProcessor.startElement(WebDBInjector.java:165)
        at
org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown
Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown
Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
        at
org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
        at
org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown
Source)
        at
org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
Source)
        at
net.nutch.db.WebDBInjector.injectDmozFile(WebDBInjector.java:402)
        at
net.nutch.db.WebDBInjector.main(WebDBInjector.java:514)

Does this patch work for others?

----------------------------------------------------------------------

Comment By: Stefan Groschupf (joa23)
Date: 2004-07-16 10:03

Message:
Logged In: YES 
user_id=396197

A similar patch was already contributed. Please visit:
http://sourceforge.net/mailarchive/message.php?msg_id=8556193 
It never comes to the cvs, so looks like people wasn't interested. ;-/
Cool! You patch sounds much more comfortable. 

So I vote for topic filtering!!! 

----------------------------------------------------------------------

Comment By: Jungshik Shin (jshin)
Date: 2004-07-16 09:46

Message:
Logged In: YES 
user_id=307557

Sorry I forgot to attach the patch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=992437&group_id=59548


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic