[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-developers
Subject:    [Nutch-dev] [jira] Commented: (NUTCH-530) Add a combiner to improve
From:       "Emmanuel Joke (JIRA)" <jira () apache ! org>
Date:       2007-07-31 11:13:52
Message-ID: 8345559.1185880432976.JavaMail.jira () brutus
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516675 \
] 

Emmanuel Joke commented on NUTCH-530:
-------------------------------------

Actually I don't re-use CrawlDbReducer, I've define a new class as Combiner. This \
class aggregates only the score of all CrawlDatum with the status "Linked" into one \
CrawlDatum. Its just a part of what CrawlDbReducer do. I've done few test in \
different case and it has no impact on the current score.

> Add a combiner to improve performance on updatedb
> -------------------------------------------------
> 
> Key: NUTCH-530
> URL: https://issues.apache.org/jira/browse/NUTCH-530
> Project: Nutch
> Issue Type: Improvement
> Environment: java 1.6
> Reporter: Emmanuel Joke
> Assignee: Emmanuel Joke
> Fix For: 1.0.0
> 
> Attachments: NUTCH-530.patch
> 
> 
> We have a lot of similar links with status "linked" generated at the ouput of the \
> map task when we try to update the crawldb based on the segment fetched. We can use \
> a combiner to improve the performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic