[prev in list] [next in list] [prev in thread] [next in thread] 

List:       php-doc-bugs
Subject:    [DOC-BUGS] #46904 [Bgs->Opn]: preg_match() example #4 is wrong
From:       felipe () php ! net
Date:       2008-12-23 19:21:44
Message-ID: 200812231921.mBNJLiHc062394 () y1 ! php ! net
[Download RAW message or body]

 ID:               46904
 Updated by:       felipe@php.net
 Reported By:      joe at digg dot com
-Status:           Bogus
+Status:           Open
 Bug Type:         Documentation problem
 Operating System: Debian GNU/Linux
 PHP Version:      Irrelevant
 New Comment:

Yes, It's expected. But Ok, we should specify a PCRE minimum version,
7.0 (bundled as of PHP 5.2.2)


Previous Comments:
------------------------------------------------------------------------

[2008-12-23 19:13:06] joe at digg dot com

This isn't bogus. At some point this was NOT valid, but now appears to

be valid. In 5.2.6 it works fine, but in 5.2.0 it does NOT:

jstump@devwww25:~$ php -v && php -q foo.php 
PHP 5.2.0-8+etch1 (cli) (built: Mar  8 2007 09:15:48) 
Copyright (c) 1997-2006 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2006 Zend Technologies
PHP Warning:  preg_match(): Compilation failed: unrecognized character

after (?< at offset 3 in /home/jstump/foo.php on line 5

Warning: preg_match(): Compilation failed: unrecognized character after

(?< at offset 3 in /home/jstump/foo.php on line 5

------------------------------------------------------------------------

[2008-12-23 19:07:31] felipe@php.net

Says the PCRE documentation:
"In PCRE, a subpattern can be named in one of three ways: (?<name>...)
or (?'name'...) as in Perl, or (?P<name>...) as in  Python."

------------------------------------------------------------------------

[2008-12-19 10:16:59] rquadling@php.net

According to the help for RegexBuddy ...

(?P<name>group) came from Python.

The PCRE followed Python's lead.

PHP offers the same functionality

So, initially you look correct.

But, again from the RegexBuddy help ...

"The regular expression classes of the .NET framework also support 
named capture. Unfortunately, the Microsoft developers decided to 
invent their own syntax, rather than follow the one pioneered by 
Python. Currently, no other regex flavor supports Microsoft's version 
of named capture.

Here is an example with two capturing groups in .NET style: (?
<first>group)(?'second'group). As you can see, .NET offers two 
syntaxes to create a capturing group: one using sharp brackets, and 
the other using single quotes. The first syntax is preferable in 
strings, where single quotes may need to be escaped. The second 
syntax is preferable in ASP code, where the sharp brackets are used 
for HTML tags. You can use the pointy bracket flavor and the quoted 
flavors interchangeably.

To reference a capturing group inside the regex, use \k<name> or 
\k'name'. Again, you can use the two syntactic variations 
interchangeably."

This info is also available on http://www.regular-
expressions.info/named.html

So, it seems PHP actually supports PCRE/Python's and Microsoft's 
mechanisms.

Ideally we should be reflecting the PCRE route but have a note that 
other mechanisms are supported.


Finally on this (from http://perldoc.perl.org/perlre.html - scroll 
down to "Capture Buffers").

"Additionally, as of Perl 5.10.0 you may use named capture buffers 
and named backreferences. The notation is (?<name>...) to declare and 
\k<name> to reference. You may also use apostrophes instead of angle 
brackets to delimit the name; and you may use the bracketed \g{name} 
backreference syntax. It's possible to refer to a named capture 
buffer by absolute and relative number as well. Outside the pattern, 
a named capture buffer is available via the %+ hash. When different 
buffers within the same pattern have the same name, $+{name} and 
\k<name> refer to the leftmost defined group. (Thus it's possible to 
do things with named capture buffers that would otherwise require (??
{}) code to accomplish.)"


So, there is a differentiation between named captures and named 
backreferences.

(?<name>regex>) is a named capture. You cannot use the name of the 
capture within the regex or the replace (if search/replacing).

So, technically and being ever so slightly picky, the documentation 
is correct.

But really it is incomplete. I'll try and put some more examples in 
differentiating between named captures and named backreferences.




------------------------------------------------------------------------

[2008-12-19 10:05:34] rquadling@php.net

I'm not so sure.

Using RegexBuddy to explain the different Regexs ...

There seems to be no difference between the 2 forms.




(?<name>\w+): (?<digit>\d+)

Options: case insensitive; ^ and $ match at line breaks

Match the regular expression below and capture its match into 
backreference with name "name"  «(?<name>\w+) »
   Match a single character that is a "word character" (letters, 
digits, etc.)  «\w+ »
      Between one and unlimited times, as many times as possible, 
giving back as needed (greedy)  «+ »
Match the characters ": " literally  «:  »
Match the regular expression below and capture its match into 
backreference with name "digit"  «(?<digit>\d+) »
   Match a single digit 0..9  «\d+ »
      Between one and unlimited times, as many times as possible, 
giving back as needed (greedy)  «+ »







(?P<name>\w+): (?P<digit>\d+)

Options: case insensitive; ^ and $ match at line breaks

Match the regular expression below and capture its match into 
backreference with name "name"  «(?P<name>\w+) »
   Match a single character that is a "word character" (letters, 
digits, etc.)  «\w+ »
      Between one and unlimited times, as many times as possible, 
giving back as needed (greedy)  «+ »
Match the characters ": " literally  «:  »
Match the regular expression below and capture its match into 
backreference with name "digit"  «(?P<digit>\d+) »
   Match a single digit 0..9  «\d+ »
      Between one and unlimited times, as many times as possible, 
giving back as needed (greedy)  «+ »






------------------------------------------------------------------------

[2008-12-18 20:21:24] tobias382 at gmail dot com

Patch for /phpdoc/en/reference/pcre/functions/preg-match.xml:

278c278
< preg_match('/(?<name>\w+): (?<digit>\d+)/', $str, $matches);
---
> preg_match('/(?P<name>\w+): (?P<digit>\d+)/', $str, $matches);

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/46904

-- 
Edit this bug report at http://bugs.php.net/?id=46904&edit=1


-- 
PHP Documentation Bugs Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic