[prev in list] [next in list] [prev in thread] [next in thread]
List: nutch-developers
Subject: [Nutch-dev] [jira] Issue Comment Edited: (NUTCH-522) Use
From: Doğacan_Güney_(JIRA) <jira () apache ! org>
Date: 2007-07-27 13:10:18
Message-ID: 15314984.1185541818316.JavaMail.jira () brutus
[Download RAW message or body]
[ https://issues.apache.org/jira/browse/NUTCH-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515991 \
]
Doğacan Güney edited comment on NUTCH-522 at 7/27/07 6:10 AM:
--------------------------------------------------------------
Btw, I thought about validation stuff a bit and IMHO it is better to run normalizers \
before UrlValidator (so the new order is normalize, validate, filter). It is possible \
that someone writes a normalizer that replaces spaces with %20s (so it becomes a \
valid url). If we have such a normalizer, we should run it before validation so that \
it will pass validation (and IMO, it should pass validation since nutch can fetch a \
url with %20's)
I think your patch looks good, but I will wait a while to hopefully get some comments \
on putting normalizers before validator.
was:
Btw, I though about validation stuff a bit and IMHO it is better to run normalizers \
before UrlValidator (so the new order is normalize, validate, filter). It is possible \
that someone writes a normalizer that replaces spaces with %20s (so it becomes a \
valid url). If we have such a normalizer, we should run it before validation so that \
it will pass validation (and IMO, it should pass validation since nutch can fetch a \
url with %20's)
I think your patch looks good, but I will wait a while to hopefully get some comments \
on putting normalizers before validator.
> Use URLValidator in the Injector
> --------------------------------
>
> Key: NUTCH-522
> URL: https://issues.apache.org/jira/browse/NUTCH-522
> Project: Nutch
> Issue Type: Improvement
> Components: injector
> Reporter: Emmanuel Joke
> Assignee: Emmanuel Joke
> Priority: Minor
> Fix For: 1.0.0
>
> Attachments: NUTCH-522.patch, NUTCH-522_v2.patch, NUTCH-522_v3.patch
>
>
> Same as NUTCH-505, we should use the UrlValidator to check url in the Injector
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic