[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-cvs
Subject:    [Nutch-cvs] nutch/conf regex-normalize.xml.template,NONE,1.1
From:       Doug Cutting <cutting () users ! sourceforge ! net>
Date:       2004-09-27 21:34:13
Message-ID: E1CC38P-0003oZ-KV () sc8-pr-cvs1 ! sourceforge ! net
[Download RAW message or body]

Update of /cvsroot/nutch/nutch/conf
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14653

Added Files:
	regex-normalize.xml.template 
Log Message:
I think this should have been committed a long time ago...


--- NEW FILE: regex-normalize.xml.template ---
<?xml version="1.0"?>
<!-- This is the configuration file for the RegexUrlNormalize Class.
     This is intended so that users can specify substitutions to be
     done on URLs. The regex engine that is used is Perl5 compatible.
     The rules are applied to URLs in the order they occur in this file.  -->

<!-- WATCH OUT: an xml parser reads this file an ampersands must be
     expanded to &amp; -->

<!-- The following rules show how to strip out session IDs 
     that are 32 characters long and have the parameter 
     name of PHPSESSID. Order does matter!  -->
<regex-normalize>
<regex>
  <pattern>(\?|\&amp;|\&amp;amp;)PHPSESSID=[a-zA-Z0-9]{32}$</pattern>
  <substitution></substitution>
</regex>
<regex>
  <pattern>(\?|\&amp;|\&amp;amp;)PHPSESSID=[a-zA-Z0-9]{32}(\&amp;|\&amp;amp;)(.*)</pattern>
  <substitution>$1$3</substitution>
</regex>
</regex-normalize>



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic