[prev in list] [next in list] [prev in thread] [next in thread]
List: privoxy-developers
Subject: [privoxy-devel] [ ijbswa-Bugs-972839 ] lookbehind works strange
From: "SourceForge.net" <noreply () sourceforge ! net>
Date: 2004-06-15 12:01:50
Message-ID: E1BaCdS-00039Q-00 () sc8-sf-web1 ! sourceforge ! net
[Download RAW message or body]
Bugs item #972839, was opened at 2004-06-14 14:39
Message generated for change (Comment added) made by dessent
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=111118&aid=972839&group_id=11118
Category: funct: filtering
Group: version 3.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: lookbehind works strange
Initial Comment:
I want to replace minus between digits with "—". I
don't want this happen in tags.
I wrote a filter:
s/(?<=>)([^<]*?\d)-(?=\d)/$1—<!-- -->/sig
It looks behind for a ">", then any symbols, which aren't
"<", then a digit, then a minus, and a digit again. I put
comment in the end to get a ">" for the next match.
In fact that filter turns string ">2004-06-14" into
"2004—<!-- -->06-14". I guess that it doesn't look
behind to see ">" generated by first replacing. Option "g"
works: the next instance of ">2004-06-14" is converted the
same way.
I use win32 binary Privoxy v3.0.3 with Opera 7.5 (guess that
browser doesn't matter here).
Log:
Jun 15 01:26:29 Privoxy(02336) Re-Filter: Adding re_filter
job s/(?<=>)([^<]*?\d)-(?=\d)/$1—<!-- -->/sig to
filter tire succeeded.
.........
Jun 15 01:26:31 Privoxy(03708) Re-Filter: re_filtering
www.livejournal.com/users/dolboeb/440396.html?nc=1
(size 21438) with filter tire...
Jun 15 01:26:31 Privoxy(03708) Re-Filter: ...produced 6
hits (new size 21522).
Sorry: i couldn't log in. SourceForge returns me neither my
password nor hash. My mail is arttreg@mail.ru
----------------------------------------------------------------------
>Comment By: Brian (dessent)
Date: 2004-06-15 05:01
Message:
Logged In: YES
user_id=585719
A useful trick for debugging these pcre substitutions is the
following:
echo "string to match" | perl -e 'use re "debugcolor";' -pe 's/foo/bar/sig'
or in your example:
echo ">2004-06-14" | perl -e 'use re "debugcolor";' \
-pe 's/(?<=>)([^<]*?\d)-(?=\d)/$1—<!-- -->/sig';
This will show the details of the RE matching. Use "debug"
instead of "debugcolor" to get a plaintext version.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2004-06-15 03:26
Message:
Logged In: NO
My fault. Today I tested my regex in perl and it looks like I have
had some misunderstanding of lookbehind in pcre.
Thank you for advices : )
----------------------------------------------------------------------
Comment By: Brian (dessent)
Date: 2004-06-15 01:01
Message:
Logged In: YES
user_id=585719
This is not a fault of Privoxy but rather just how pcre works.
When using "/g" all the matching is done against the
unmodified source string. You can't base the next
replacement off something that was changed in the previous
match, unless you iterate through each replacement
individually, i.e. without using "/g", which would not be
possible with Privoxy.
In my opinion you're going about this wrong. My first advice
would be that if you have some specific forms of data that
you're trying to match, then just code something to match
them, such as dates. You'll pull your hair out trying to do
something that's completely and 100% generic and doesn't
fail in some circumstances... This is precisely why future
Privoxies will require a real parser, as working with tags with
REs like this can be very hard to do right.
My second suggestion, if you can't make your filter specifc to
some easily identifiable data would be to make two filters.
The first changes all occurances of a '-' between digit groups
to — and a distinctive html comment to flag the
replacement. Then a second filter looks for "&mdash<!-- foo -
->" inside tags and changes them back to regular '-'
characters. This is neither as pretty nor as efficient, but
sometimes you have to resort to doing it in more than one
step.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=111118&aid=972839&group_id=11118
-------------------------------------------------------
This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
_______________________________________________
Ijbswa-developers mailing list
Ijbswa-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ijbswa-developers
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic