[prev in list] [next in list] [prev in thread] [next in thread]
List: nutch-cvs
Subject: [Nutch-cvs] nutch/conf regex-normalize.xml.template,NONE,1.1
From: Doug Cutting <cutting () users ! sourceforge ! net>
Date: 2004-09-27 21:34:13
Message-ID: E1CC38P-0003oZ-KV () sc8-pr-cvs1 ! sourceforge ! net
[Download RAW message or body]
Update of /cvsroot/nutch/nutch/conf
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14653
Added Files:
regex-normalize.xml.template
Log Message:
I think this should have been committed a long time ago...
--- NEW FILE: regex-normalize.xml.template ---
<?xml version="1.0"?>
<!-- This is the configuration file for the RegexUrlNormalize Class.
This is intended so that users can specify substitutions to be
done on URLs. The regex engine that is used is Perl5 compatible.
The rules are applied to URLs in the order they occur in this file. -->
<!-- WATCH OUT: an xml parser reads this file an ampersands must be
expanded to & -->
<!-- The following rules show how to strip out session IDs
that are 32 characters long and have the parameter
name of PHPSESSID. Order does matter! -->
<regex-normalize>
<regex>
<pattern>(\?|\&|\&amp;)PHPSESSID=[a-zA-Z0-9]{32}$</pattern>
<substitution></substitution>
</regex>
<regex>
<pattern>(\?|\&|\&amp;)PHPSESSID=[a-zA-Z0-9]{32}(\&|\&amp;)(.*)</pattern>
<substitution>$1$3</substitution>
</regex>
</regex-normalize>
-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic