[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl5-porters
Subject:    [perl #131822] A multiline regex that starts with /^/m is much slower than the corresponding one tha
From:       shlomif () shlomifish ! org (via RT) <perlbug-followup () perl ! org>
Date:       2017-07-31 20:10:46
Message-ID: rt-4.0.24-16246-1501531846-1351.131822-75-0 () perl ! org
[Download RAW message or body]

# New Ticket Created by  shlomif@shlomifish.org 
# Please include the string:  [perl #131822]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=131822 >


A multiline regex that starts with /^/m is much slower than the corresponding
one that starts with /\n/. Below is Dave Mitchell's analysis:

The code can be reduced to the following:

    my $nomatch = <<EOF;
    Start 1 =
    Boo
    End
    EOF

    my $match = <<EOF;
    Start 1 =
    End
    EOF

    $_ = ($nomatch x 10_000) . $match;
    my $n = $ARGV[0] ? '^' : '\n';
    m/${n}Start [0-9]+ =\nEnd\n/m or die;

$ time perl5260o ~/tmp/p 0; time perl5260o ~/tmp/p 1

real	0m0.004s
user	0m0.002s
sys	0m0.002s

real	0m0.691s
user	0m0.690s
sys	0m0.001s


It's probably down to this in regexec_flags():

            /* note that with PREGf_IMPLICIT, intuit can only fail
             * or return the start position, so it's of limited utility.
             * Nevertheless, I made the decision that the potential for
             * quick fail was still worth it - DAPM */

Basically the '^' causes it to (fruitlessly) run intuit at the start of
every line; the \n instead causes it to just fbm to the next "\nStart"
string.

I may need to revisit that decision. The whole 'pick the next viable start
position' logic in regexec_flags() needs an overhaul, and its on my list
of things to do (but not currently near the top).

===========

Please look into fixing it in a future version.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic