[prev in list] [next in list] [prev in thread] [next in thread]
List: perl5-porters
Subject: [perl #131822] A multiline regex that starts with /^/m is much slower than the corresponding one tha
From: shlomif () shlomifish ! org (via RT) <perlbug-followup () perl ! org>
Date: 2017-07-31 20:10:46
Message-ID: rt-4.0.24-16246-1501531846-1351.131822-75-0 () perl ! org
[Download RAW message or body]
# New Ticket Created by shlomif@shlomifish.org
# Please include the string: [perl #131822]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/Ticket/Display.html?id=131822 >
A multiline regex that starts with /^/m is much slower than the corresponding
one that starts with /\n/. Below is Dave Mitchell's analysis:
The code can be reduced to the following:
my $nomatch = <<EOF;
Start 1 =
Boo
End
EOF
my $match = <<EOF;
Start 1 =
End
EOF
$_ = ($nomatch x 10_000) . $match;
my $n = $ARGV[0] ? '^' : '\n';
m/${n}Start [0-9]+ =\nEnd\n/m or die;
$ time perl5260o ~/tmp/p 0; time perl5260o ~/tmp/p 1
real 0m0.004s
user 0m0.002s
sys 0m0.002s
real 0m0.691s
user 0m0.690s
sys 0m0.001s
It's probably down to this in regexec_flags():
/* note that with PREGf_IMPLICIT, intuit can only fail
* or return the start position, so it's of limited utility.
* Nevertheless, I made the decision that the potential for
* quick fail was still worth it - DAPM */
Basically the '^' causes it to (fruitlessly) run intuit at the start of
every line; the \n instead causes it to just fbm to the next "\nStart"
string.
I may need to revisit that decision. The whole 'pick the next viable start
position' logic in regexec_flags() needs an overhaul, and its on my list
of things to do (but not currently near the top).
===========
Please look into fixing it in a future version.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic