[prev in list] [next in list] [prev in thread] [next in thread]
List: perl5-changes
Subject: [Perl/perl5] 494b08: regcomp.c: ACCEPT inside of a (...)+ should disabl...
From: Yves Orton via perl5-changes <perl5-changes () perl ! org>
Date: 2022-03-30 8:43:53
Message-ID: Perl/perl5/push/refs/heads/yves/fix_accept/263be8-3111f8 () github ! com
[Download RAW message or body]
Branch: refs/heads/yves/fix_accept
Home: https://github.com/Perl/perl5
Commit: 494b080d02038073c4a374cc0981d487a01f85f0
https://github.com/Perl/perl5/commit/494b080d02038073c4a374cc0981d487a01f85f0
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regcomp.c
Log Message:
-----------
regcomp.c: ACCEPT inside of a (...)+ should disable mandatory substrings
GH Issue #19484 reported that
print "ABDE" =~ /(A (A|B(*ACCEPT)|C)+ D)(E)/x ? "yes: <$1-$2>" : "no";
does not output the expected 'AB-B', and instead does not match.
Removing the + quantifier behaves as expected.
This patch is 1/4 and fixes part of the problem: the regex optimizer
S_study_chunk() was not handling the ACCEPT properly and assuming the
pattern MUST contain 'A' and 'DE', however this is wrong, the ACCEPT
means that it must only contain 'A' and the 'DE' is actually optional.
Commit: e88d9850ae3713fe302e7c9d54a312c16d733a77
https://github.com/Perl/perl5/commit/e88d9850ae3713fe302e7c9d54a312c16d733a77
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regexec.c
Log Message:
-----------
regexec.c: Fix up for ACCEPT inside of a (...)+ set lastopen in CURLYM
GH Issue #19484 reported that
print "ABDE" =~ /(A (A|B(*ACCEPT)|C)+ D)(E)/x ? "yes: <$1-$2>" : "no";
does not output the expected 'AB-B', and instead does not match.
Removing the + quantifier behaves as expected.
This patch is 2/4 and fixes part of the problem: lastopen was not
being set properly inside of the CURLYM optimization. lastopen is used
by the ACCEPT logic to know which parens need to be closed.
Commit: 317d9fad1f02b2a59195ff331551154b28055b0b
https://github.com/Perl/perl5/commit/317d9fad1f02b2a59195ff331551154b28055b0b
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regexec.c
Log Message:
-----------
regexec.c: ACCEPT inside of a (...)+ should stop looping (CURLYM optimization)
GH Issue #19484 reported that
print "ABDE" =~ /(A (A|B(*ACCEPT)|C)+ D)(E)/x ? "yes: <$1-$2>" : "no";
does not output the expected 'AB-B', and instead does not match.
Removing the + quantifier behaves as expected.
This patch is 3/4 and fixes part of the problem: the CURLYM optimization
was not terminating its loop properly when it contained an ACCEPT. This
patch adds a new variable 'is_accepted' which is used to ensure that the
CURLYM optimization stops after an ACCEPT regop is executed.
Commit: f7d2c64c61311c797b00569af64c45abe24e9ba0
https://github.com/Perl/perl5/commit/f7d2c64c61311c797b00569af64c45abe24e9ba0
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regexec.c
M t/re/pat.t
Log Message:
-----------
regexec.c: make ACCEPT close logic handle SUCCEED/LOOKBEHIND_END opcodes
GH Issue #19484 reported that
print "ABDE" =~ /(A (A|B(*ACCEPT)|C)+ D)(E)/x ? "yes: <$1-$2>" : "no";
does not output the expected 'AB-B', and instead does not match.
Removing the + quantifier behaves as expected.
This patch is 4/4 of the patches to fix this problem: SUCCEED and
LOOKBEHIND_END regops are type 'END' which have a next_off of 0. This
was causing regnext() to return null inside of the loop iterator for the
logic in ACCEPT which closes any open capture buffers thus terminating
the loop prematurely and preventing some of the capture buffers from
being properly closed. SUCCEED is used to end a variety of structures,
including lookahead IFMATCH and UNLESSM, SUSPEND, and CURLYM, and
LOOKBEHIND_END serves the same purpose for lookbehind IFMATCH and
UNLESSM. Thus this patch fixes the original bug but also fixes a variety
of other cases involving ACCEPT.
Commit: 8c1aab1b343245e70624e43fe82062747bce5827
https://github.com/Perl/perl5/commit/8c1aab1b343245e70624e43fe82062747bce5827
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M t/re/re_tests
Log Message:
-----------
re_tests: ACCEPT with CURLYM optimization
Commit: 8798ca55b828c711f036a17df012e4720a80aefa
https://github.com/Perl/perl5/commit/8798ca55b828c711f036a17df012e4720a80aefa
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M t/re/re_tests
Log Message:
-----------
re_tests: ACCEPT followed by SUSPEND
Commit: c31ecbb0f687663e38b0eb01a67116379e3ca95d
https://github.com/Perl/perl5/commit/c31ecbb0f687663e38b0eb01a67116379e3ca95d
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M t/re/re_tests
Log Message:
-----------
re_tests: ACCEPT followed by IFMATCH fixed width pos lookbehind
Commit: 6a7c91b122596d806f3d1eaef98a0efdba3ad388
https://github.com/Perl/perl5/commit/6a7c91b122596d806f3d1eaef98a0efdba3ad388
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regcomp.c
Log Message:
-----------
regcomp.c: enhance S_debug_studydata to show min/stopmin/delta
Also call it many more times during the study_chunk() process.
This is helpful for debuging minlen related issues.
Note this function is not in embed.fnc and is used strictly inside
of the regex engine, so no changes there.
Commit: cf5ddd8bdeeeed4e6e8ead9cb03c86ed90ad84d1
https://github.com/Perl/perl5/commit/cf5ddd8bdeeeed4e6e8ead9cb03c86ed90ad84d1
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regcomp.c
Log Message:
-----------
regcomp.c: reorder and comment S_study_chunk() internal vars
The list was kinda random and did not include many comments, which made
life difficult understanding what the purpose of the different vars.
This documents them, and includes a follow up to investigate that came
up during the documentation process: first_non_open seems a bit off.
Will investigate and improve further in a subsequent patch.
Commit: 00eee9e2294d7d562d6e50eb74dff73ecf5a9c30
https://github.com/Perl/perl5/commit/00eee9e2294d7d562d6e50eb74dff73ecf5a9c30
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regcomp.c
Log Message:
-----------
regcomp.c: minor blank line removal/insertion for clarity
Commit: 3e00bff54491e5fbeb8ec8f5015c2ac99e78cf87
https://github.com/Perl/perl5/commit/3e00bff54491e5fbeb8ec8f5015c2ac99e78cf87
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regcomp.c
Log Message:
-----------
regcomp.c: deal with stopmin and min properly
stopmin is set when we encounter an ACCEPT, it basically says "even
though the minlen might look like X it is actually a smaller Y". It also
implies that delta (which refers to the max length a pattern might
match) should be at least a certain size. This was not being handled
properly nor propagated to callers in all situations. This in particular
affected use of ACCEPT inside of lookbehind. This also made final_minlen
redundant and it has been removed.
Commit: 004939ca821b69bc501a30843fb3a8fcc537d6be
https://github.com/Perl/perl5/commit/004939ca821b69bc501a30843fb3a8fcc537d6be
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M t/re/re_tests
Log Message:
-----------
re_tests: ACCEPT followed by UNLESSM fixed width negative lookbehind
Make sure that things with END type opcodes don't mess up ACCEPT paren
close logic.
Commit: de046300b7f72845a534f21d51c12c7f2e0a730c
https://github.com/Perl/perl5/commit/de046300b7f72845a534f21d51c12c7f2e0a730c
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M t/re/re_tests
Log Message:
-----------
re_tests: ACCEPT inside of named capture accessed via GOSUB
Commit: 2248c6ae69ff97e5b47024133c1ac8211f62e192
https://github.com/Perl/perl5/commit/2248c6ae69ff97e5b47024133c1ac8211f62e192
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M t/re/re_tests
Log Message:
-----------
re_tests: ACCEPT in IFMATCH: variable positive lookbehind
Commit: b0f4d756cb51c7a03521e4f040edda19cae76b8a
https://github.com/Perl/perl5/commit/b0f4d756cb51c7a03521e4f040edda19cae76b8a
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M t/re/re_tests
Log Message:
-----------
re_tests: ACCEPT in UNLESSM variable negative lookbehind
Commit: 77b897561f3964834f1f1a023a052f52ffa308de
https://github.com/Perl/perl5/commit/77b897561f3964834f1f1a023a052f52ffa308de
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M t/re/reg_mesg.t
Log Message:
-----------
reg_mesg.t: check that ACCEPT in capturing variable length lookbehind warns
Commit: a81d6ea45e9df27e5a96ce6ae21cff2cccb29b7c
https://github.com/Perl/perl5/commit/a81d6ea45e9df27e5a96ce6ae21cff2cccb29b7c
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regcomp.c
M t/re/re_tests
Log Message:
-----------
regcomp.c: With ACCEPT set stopmin even if no data struct present
Otherwise top level branches can end up with mistaken minlen with
ACCEPT. Eg, /A(*ACCEPT)B/ does not require B to be present.
Commit: 3111f81d782f42db4307f6663f11e5b59ec551f5
https://github.com/Perl/perl5/commit/3111f81d782f42db4307f6663f11e5b59ec551f5
Author: Yves Orton <demerphq@gmail.com>
Date: 2022-03-30 (Wed, 30 Mar 2022)
Changed paths:
M regcomp.c
M t/re/opt.t
M t/re/re_tests
Log Message:
-----------
regcomp.c: fix substring optimizer for ACCEPT inside of CURLY
ACCEPT essentially overrides quantifiers larger than 1. Eg,
/(A){2}/ has a mincount of 2 and a maxcount of 2, and a minlen
of 2 for "AA". But /(A(*ACCEPT)){2}/ should actually have a
mincount of 1, and a minlen of 1 as it can match 'A'. In the regex
engine proper this doesn't matter, we just do the right thing. But
in the optimizer it matters. This patch sets the mincount to 1 in
such cases whenever the contents contains an ACCEPT.
Thanks for Hugo for asking the questions that lead to this patch.
Compare: https://github.com/Perl/perl5/compare/263be8fd93c7...3111f81d782f
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic