[prev in list] [next in list] [prev in thread] [next in thread] 

List:       privoxy-users
Subject:    Re: [privoxy-users] speeding up a filter
From:       Fabian Keil <fk () fabiankeil ! de>
Date:       2012-03-31 20:11:09
Message-ID: 20120331221109.17f27fc0.fk () fabiankeil ! de
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


privoxyusers@i.lucanops.net wrote:

> On Fri, 2012-03-30 at 00:17 +0200, Fabian Keil wrote:
> > I assume [form|input] doesn't do what you intended, though,
> > you probably want something like (?:form|input). Additionally
> > I suspect you may want ['"]? instead of ['?|"?].
> 
> Thanks for pointing those out. For form/input I chose square brackets
> because IIRC it worked when I tested it, and I thought curved brackets
> were for doing the $1 $2 thing..... but no, it doesn't seem to interfere
> after replacing the brackets.

Curved brackets can be used to fill variables and
to group patterns at the same time. Curved brackets
whose content starts with ?: group patterns without
filling variables.

Square brackets are for character classes.

While (?form|input) will match both form and input,
[form|input] matches any of the characters in the
brackets.

<[form|input].* would thus match any tag starting with
one of those characters and not just form and input tags.
 
> I have tried ['"]? and it doesn't quite work right, though I think I
> have figured out why...
> 
> Original HTML from http://www.tvguide.co.uk/:
> 
> <!--<input name="title" autocomplete="off" type="text"
> class="topnav_form" />-->
> <input name="title" autocomplete="off"
> onKeyUp="ajax_showOptions(this,'getCountriesByLetters')"  type="text"
> class="topnav_form" />
> 
> 
> With this filter using ['"]? there is a quotation mark left behind in
> the HTML, between the name and type/onKeyUp attributes of the input
> tags:
> 
> s@(<(?:form|input)[^>]*)autocomplete=['"]?off['"]?([^>]*>)@$1 $2 <!--
> Autocomplete deleted by Privoxy. -->@Uig
> 
> <!--<input name="title"  " type="text" class="topnav_form" /> <!--
> Autocomplete deleted by Privoxy. -->-->
> <input name="title"  "
> onKeyUp="ajax_showOptions(this,'getCountriesByLetters')"  type="text"
> class="topnav_form" /> <!-- Autocomplete deleted by Privoxy. -->
> 
> 
> With either of these filters (the difference being ['?"?] and ['?|"?])
> the quotation marks from the original HTML are gone:

Note that ['?"?] matches one of the three characters ", ', and ?. 
['?|"?] matches the same characters and additionally matches |.

> s@(<(?:form|input)[^>]*)autocomplete=['?"?]off['?"?]([^>]*>)@$1 $2
> <!-- Autocomplete deleted by Privoxy. -->@Uig
> s@(<(?:form|input)[^>]*)autocomplete=['?|"?]off['?|"?]([^>]*>)@$1 $2
> <!-- Autocomplete deleted by Privoxy. -->@Uig
> 
> <!--<input name="title"   type="text" class="topnav_form" /> <!--
> Autocomplete deleted by Privoxy. -->-->
> <input name="title"
> onKeyUp="ajax_showOptions(this,'getCountriesByLetters')"  type="text"
> class="topnav_form" /> <!-- Autocomplete deleted by Privoxy. -->
> 
> Is it because whilst ['"]? will match ...off", ...off' or ...off, as
> ['"]? is followed by [^>]* the last quotation mark is included in
> [^>]* , with the question mark in ['"]? allowing it?

Yes.

['"]? will match a single or double quote, but also nothing.
The U flag makes ['"]? "ungreedy", so it will only match a character
if it has to and in your filter it doesn't.

The question marks in ['?|"?] have no special meaning, so
the pattern will always match one of five characters between
the brackets or not match at all.

You can solve this problem by using ['"]?? where the
second question mark will make the pattern greedy despite
the U flag:

s@(<(?:form|input)[^>]*)autocomplete=['"]?off['"]??([^>]*>)@$1 $2 <!-- Autocomplete \
deleted by Privoxy. -->@Uig

Or change ([^>]*>) to ( [^>]*>) to force ["']? to match the quote:

s@(<(?:form|input)[^>]*)autocomplete=['"]?off['"]?( [^>]*>)@$1 $2 <!-- Autocomplete \
deleted by Privoxy. -->@Uig

Or remove the U flag which should no longer be required now
that you are using .* instead of [^>]*:

s@(<(?:form|input)[^>]*)autocomplete=['"]?off['"]?([^>]*>)@$1 $2 <!-- Autocomplete \
deleted by Privoxy. -->@ig

> Damn, questions about regex are hard to write! I hope the above makes
> sense.
> 
> Thanks a lot for your time, I am a relative n00b with regex so apologies
> if I'm getting some of the relative basics wrong.

You're welcome.

Fabian


["signature.asc" (application/pgp-signature)]

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Ijbswa-users mailing list
Ijbswa-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ijbswa-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic