[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl5-changes
Subject:    [Perl/perl5] 494b08: regcomp.c: ACCEPT inside of a (...)+ should disabl...
From:       Yves Orton via perl5-changes <perl5-changes () perl ! org>
Date:       2022-03-30 8:43:53
Message-ID: Perl/perl5/push/refs/heads/yves/fix_accept/263be8-3111f8 () github ! com
[Download RAW message or body]

  Branch: refs/heads/yves/fix_accept
  Home:   https://github.com/Perl/perl5
  Commit: 494b080d02038073c4a374cc0981d487a01f85f0
      https://github.com/Perl/perl5/commit/494b080d02038073c4a374cc0981d487a01f85f0
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: ACCEPT inside of a (...)+ should disable mandatory substrings

GH Issue #19484 reported that

    print "ABDE" =~ /(A (A|B(*ACCEPT)|C)+ D)(E)/x ? "yes: <$1-$2>" : "no";

does not output the expected 'AB-B', and instead does not match.
Removing the + quantifier behaves as expected.

This patch is 1/4 and fixes part of the problem: the regex optimizer
S_study_chunk() was not handling the ACCEPT properly and assuming the
pattern MUST contain 'A' and 'DE', however this is wrong, the ACCEPT
means that it must only contain 'A' and the 'DE' is actually optional.


  Commit: e88d9850ae3713fe302e7c9d54a312c16d733a77
      https://github.com/Perl/perl5/commit/e88d9850ae3713fe302e7c9d54a312c16d733a77
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: Fix up for ACCEPT inside of a (...)+ set lastopen in CURLYM

GH Issue #19484 reported that

    print "ABDE" =~ /(A (A|B(*ACCEPT)|C)+ D)(E)/x ? "yes: <$1-$2>" : "no";

does not output the expected 'AB-B', and instead does not match.
Removing the + quantifier behaves as expected.

This patch is 2/4 and fixes part of the problem: lastopen was not
being set properly inside of the CURLYM optimization. lastopen is used
by the ACCEPT logic to know which parens need to be closed.


  Commit: 317d9fad1f02b2a59195ff331551154b28055b0b
      https://github.com/Perl/perl5/commit/317d9fad1f02b2a59195ff331551154b28055b0b
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: ACCEPT inside of a (...)+ should stop looping (CURLYM optimization)

GH Issue #19484 reported that

    print "ABDE" =~ /(A (A|B(*ACCEPT)|C)+ D)(E)/x ? "yes: <$1-$2>" : "no";

does not output the expected 'AB-B', and instead does not match.
Removing the + quantifier behaves as expected.

This patch is 3/4 and fixes part of the problem: the CURLYM optimization
was not terminating its loop properly when it contained an ACCEPT. This
patch adds a new variable 'is_accepted' which is used to ensure that the
CURLYM optimization stops after an ACCEPT regop is executed.


  Commit: f7d2c64c61311c797b00569af64c45abe24e9ba0
      https://github.com/Perl/perl5/commit/f7d2c64c61311c797b00569af64c45abe24e9ba0
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regexec.c
    M t/re/pat.t

  Log Message:
  -----------
  regexec.c: make ACCEPT close logic handle SUCCEED/LOOKBEHIND_END opcodes

GH Issue #19484 reported that

    print "ABDE" =~ /(A (A|B(*ACCEPT)|C)+ D)(E)/x ? "yes: <$1-$2>" : "no";

does not output the expected 'AB-B', and instead does not match.
Removing the + quantifier behaves as expected.

This patch is 4/4 of the patches to fix this problem: SUCCEED and
LOOKBEHIND_END regops are type 'END' which have a next_off of 0. This
was causing regnext() to return null inside of the loop iterator for the
logic in ACCEPT which closes any open capture buffers thus terminating
the loop prematurely and preventing some of the capture buffers from
being properly closed. SUCCEED is used to end a variety of structures,
including lookahead IFMATCH and UNLESSM, SUSPEND, and CURLYM, and
LOOKBEHIND_END serves the same purpose for lookbehind IFMATCH and
UNLESSM. Thus this patch fixes the original bug but also fixes a variety
of other cases involving ACCEPT.


  Commit: 8c1aab1b343245e70624e43fe82062747bce5827
      https://github.com/Perl/perl5/commit/8c1aab1b343245e70624e43fe82062747bce5827
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M t/re/re_tests

  Log Message:
  -----------
  re_tests: ACCEPT with CURLYM optimization


  Commit: 8798ca55b828c711f036a17df012e4720a80aefa
      https://github.com/Perl/perl5/commit/8798ca55b828c711f036a17df012e4720a80aefa
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M t/re/re_tests

  Log Message:
  -----------
  re_tests: ACCEPT followed by SUSPEND


  Commit: c31ecbb0f687663e38b0eb01a67116379e3ca95d
      https://github.com/Perl/perl5/commit/c31ecbb0f687663e38b0eb01a67116379e3ca95d
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M t/re/re_tests

  Log Message:
  -----------
  re_tests: ACCEPT followed by IFMATCH fixed width pos lookbehind


  Commit: 6a7c91b122596d806f3d1eaef98a0efdba3ad388
      https://github.com/Perl/perl5/commit/6a7c91b122596d806f3d1eaef98a0efdba3ad388
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: enhance S_debug_studydata to show min/stopmin/delta

Also call it many more times during the study_chunk() process.

This is helpful for debuging minlen related issues.

Note this function is not in embed.fnc and is used strictly inside
of the regex engine, so no changes there.


  Commit: cf5ddd8bdeeeed4e6e8ead9cb03c86ed90ad84d1
      https://github.com/Perl/perl5/commit/cf5ddd8bdeeeed4e6e8ead9cb03c86ed90ad84d1
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: reorder and comment S_study_chunk() internal vars

The list was kinda random and did not include many comments, which made
life difficult understanding what the purpose of the different vars.

This documents them, and includes a follow up to investigate that came
up during the documentation process: first_non_open seems a bit off.
Will investigate and improve further in a subsequent patch.


  Commit: 00eee9e2294d7d562d6e50eb74dff73ecf5a9c30
      https://github.com/Perl/perl5/commit/00eee9e2294d7d562d6e50eb74dff73ecf5a9c30
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: minor blank line removal/insertion for clarity


  Commit: 3e00bff54491e5fbeb8ec8f5015c2ac99e78cf87
      https://github.com/Perl/perl5/commit/3e00bff54491e5fbeb8ec8f5015c2ac99e78cf87
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: deal with stopmin and min properly

stopmin is set when we encounter an ACCEPT, it basically says "even
though the minlen might look like X it is actually a smaller Y". It also
implies that delta (which refers to the max length a pattern might
match) should be at least a certain size. This was not being handled
properly nor propagated to callers in all situations. This in particular
affected use of ACCEPT inside of lookbehind. This also made final_minlen
redundant and it has been removed.


  Commit: 004939ca821b69bc501a30843fb3a8fcc537d6be
      https://github.com/Perl/perl5/commit/004939ca821b69bc501a30843fb3a8fcc537d6be
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M t/re/re_tests

  Log Message:
  -----------
  re_tests: ACCEPT followed by UNLESSM fixed width negative lookbehind

Make sure that things with END type opcodes don't mess up ACCEPT paren
close logic.


  Commit: de046300b7f72845a534f21d51c12c7f2e0a730c
      https://github.com/Perl/perl5/commit/de046300b7f72845a534f21d51c12c7f2e0a730c
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M t/re/re_tests

  Log Message:
  -----------
  re_tests: ACCEPT inside of named capture accessed via GOSUB


  Commit: 2248c6ae69ff97e5b47024133c1ac8211f62e192
      https://github.com/Perl/perl5/commit/2248c6ae69ff97e5b47024133c1ac8211f62e192
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M t/re/re_tests

  Log Message:
  -----------
  re_tests: ACCEPT in IFMATCH: variable positive lookbehind


  Commit: b0f4d756cb51c7a03521e4f040edda19cae76b8a
      https://github.com/Perl/perl5/commit/b0f4d756cb51c7a03521e4f040edda19cae76b8a
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M t/re/re_tests

  Log Message:
  -----------
  re_tests: ACCEPT in UNLESSM variable negative lookbehind


  Commit: 77b897561f3964834f1f1a023a052f52ffa308de
      https://github.com/Perl/perl5/commit/77b897561f3964834f1f1a023a052f52ffa308de
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M t/re/reg_mesg.t

  Log Message:
  -----------
  reg_mesg.t: check that ACCEPT in capturing variable length lookbehind warns


  Commit: a81d6ea45e9df27e5a96ce6ae21cff2cccb29b7c
      https://github.com/Perl/perl5/commit/a81d6ea45e9df27e5a96ce6ae21cff2cccb29b7c
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regcomp.c
    M t/re/re_tests

  Log Message:
  -----------
  regcomp.c: With ACCEPT set stopmin even if no data struct present

Otherwise top level branches can end up with mistaken minlen with
ACCEPT. Eg, /A(*ACCEPT)B/ does not require B to be present.


  Commit: 3111f81d782f42db4307f6663f11e5b59ec551f5
      https://github.com/Perl/perl5/commit/3111f81d782f42db4307f6663f11e5b59ec551f5
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-03-30 (Wed, 30 Mar 2022)

  Changed paths:
    M regcomp.c
    M t/re/opt.t
    M t/re/re_tests

  Log Message:
  -----------
  regcomp.c: fix substring optimizer for ACCEPT inside of CURLY

ACCEPT essentially overrides quantifiers larger than 1. Eg,
/(A){2}/ has a mincount of 2 and a maxcount of 2, and a minlen
of 2 for "AA". But /(A(*ACCEPT)){2}/ should actually have a
mincount of 1, and a minlen of 1 as it can match 'A'. In the regex
engine proper this doesn't matter, we just do the right thing. But
in the optimizer it matters. This patch sets the mincount to 1 in
such cases whenever the contents contains an ACCEPT.

Thanks for Hugo for asking the questions that lead to this patch.


Compare: https://github.com/Perl/perl5/compare/263be8fd93c7...3111f81d782f
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic