[prev in list] [next in list] [prev in thread] [next in thread]
List: nutch-cvs
Subject: [Nutch-cvs] svn commit: r561306 - in /lucene/nutch/trunk:
From: dogacan () apache ! org
Date: 2007-07-31 12:07:30
Message-ID: 20070731120730.F0C861A981A () eris ! apache ! org
[Download RAW message or body]
Author: dogacan
Date: Tue Jul 31 05:07:30 2007
New Revision: 561306
URL: http://svn.apache.org/viewvc?view=rev&rev=561306
Log:
NUTCH-533 - LinkDbMerger: url normalized is not updated in the key and inlinks list. \
Contributed by Emmanuel Joke.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDbFilter.java
Modified: lucene/nutch/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?view=diff&rev=561306&r1=561305&r2=561306
==============================================================================
--- lucene/nutch/trunk/CHANGES.txt (original)
+++ lucene/nutch/trunk/CHANGES.txt Tue Jul 31 05:07:30 2007
@@ -107,6 +107,9 @@
with redirected pages, and this issue can be considered as a band-aid
for the time being. See NUTCH-273 and NUTCH-353 for more details.
+36. NUTCH-533 - LinkDbMerger: url normalized is not updated in the key and
+ inlinks list. (Emmanuel Joke via dogacan)
+
Release 0.9 - 2007-04-02
1. Changed log4j confiquration to log to stdout on commandline
Modified: lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDbFilter.java
URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDbFilter.java?view=diff&rev=561306&r1=561305&r2=561306
==============================================================================
--- lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDbFilter.java (original)
+++ lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDbFilter.java Tue Jul 31 \
05:07:30 2007 @@ -22,6 +22,7 @@
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.JobConf;
@@ -57,6 +58,8 @@
private String scope;
public static final Log LOG = LogFactory.getLog(LinkDbFilter.class);
+
+ private Text newKey = new Text();
public void configure(JobConf job) {
this.jobConf = job;
@@ -75,6 +78,7 @@
public void map(WritableComparable key, Writable value, OutputCollector output, \
Reporter reporter) throws IOException { String url = key.toString();
+ Inlinks result = new Inlinks();
if (normalize) {
try {
url = normalizers.normalize(url, scope); // normalize the url
@@ -114,11 +118,13 @@
fromUrl = null;
}
}
- if (fromUrl == null) { // should be discarded
- it.remove();
+ if (fromUrl != null) {
+ result.add(new Inlink(fromUrl, inlink.getAnchor()));
}
}
- if (inlinks.size() == 0) return; // don't collect empy inlinks
- output.collect(key, inlinks);
+ if (result.size() > 0) { // don't collect empty inlinks
+ newKey.set(url);
+ output.collect(newKey, result);
+ }
}
}
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic