[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-developers
Subject:    [Nutch-dev] [jira] Issue Comment Edited: (NUTCH-522) Use
From:       Doğacan_Güney_(JIRA) <jira () apache ! org>
Date:       2007-07-27 13:10:18
Message-ID: 15314984.1185541818316.JavaMail.jira () brutus
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/NUTCH-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515991 \
] 

Doğacan Güney edited comment on NUTCH-522 at 7/27/07 6:10 AM:
--------------------------------------------------------------

Btw, I thought about validation stuff a bit and IMHO it is better to run normalizers \
before UrlValidator (so the new order is normalize, validate, filter). It is possible \
that someone writes a normalizer that replaces spaces with %20s (so it becomes a \
valid url). If we have such a normalizer, we should run it before validation so that \
it will pass validation (and IMO, it should pass validation since nutch can fetch a \
url with %20's)

I think your patch looks good, but I will wait a while to hopefully get some comments \
on putting normalizers before validator.


 was:
Btw, I though about validation stuff a bit and IMHO it is better to run normalizers \
before UrlValidator (so the new order is normalize, validate, filter). It is possible \
that someone writes a normalizer that replaces spaces with %20s (so it becomes a \
valid url). If we have such a normalizer, we should run it before validation so that \
it will pass validation (and IMO, it should pass validation since nutch can fetch a \
url with %20's)

I think your patch looks good, but I will wait a while to hopefully get some comments \
on putting normalizers before validator.

> Use URLValidator in the Injector
> --------------------------------
> 
> Key: NUTCH-522
> URL: https://issues.apache.org/jira/browse/NUTCH-522
> Project: Nutch
> Issue Type: Improvement
> Components: injector
> Reporter: Emmanuel Joke
> Assignee: Emmanuel Joke
> Priority: Minor
> Fix For: 1.0.0
> 
> Attachments: NUTCH-522.patch, NUTCH-522_v2.patch, NUTCH-522_v3.patch
> 
> 
> Same as NUTCH-505, we should use the UrlValidator to check url in the Injector

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic