[prev in list] [next in list] [prev in thread] [next in thread] 

List:       php-doc-bugs
Subject:    [DOC-BUGS] #46904 [Opn->Bgs]: preg_match() example #4 is wrong
From:       felipe () php ! net
Date:       2008-12-23 19:07:31
Message-ID: 200812231907.mBNJ7VAl061212 () y1 ! php ! net
[Download RAW message or body]

 ID:               46904
 Updated by:       felipe@php.net
 Reported By:      joe at digg dot com
-Status:           Open
+Status:           Bogus
 Bug Type:         Documentation problem
 Operating System: Debian GNU/Linux
 PHP Version:      Irrelevant
 New Comment:

Says the PCRE documentation:
"In PCRE, a subpattern can be named in one of three ways: (?<name>...)
or (?'name'...) as in Perl, or (?P<name>...) as in  Python."


Previous Comments:
------------------------------------------------------------------------

[2008-12-19 10:16:59] rquadling@php.net

According to the help for RegexBuddy ...

(?P<name>group) came from Python.

The PCRE followed Python's lead.

PHP offers the same functionality

So, initially you look correct.

But, again from the RegexBuddy help ...

"The regular expression classes of the .NET framework also support 
named capture. Unfortunately, the Microsoft developers decided to 
invent their own syntax, rather than follow the one pioneered by 
Python. Currently, no other regex flavor supports Microsoft's version 
of named capture.

Here is an example with two capturing groups in .NET style: (?
<first>group)(?'second'group). As you can see, .NET offers two 
syntaxes to create a capturing group: one using sharp brackets, and 
the other using single quotes. The first syntax is preferable in 
strings, where single quotes may need to be escaped. The second 
syntax is preferable in ASP code, where the sharp brackets are used 
for HTML tags. You can use the pointy bracket flavor and the quoted 
flavors interchangeably.

To reference a capturing group inside the regex, use \k<name> or 
\k'name'. Again, you can use the two syntactic variations 
interchangeably."

This info is also available on http://www.regular-
expressions.info/named.html

So, it seems PHP actually supports PCRE/Python's and Microsoft's 
mechanisms.

Ideally we should be reflecting the PCRE route but have a note that 
other mechanisms are supported.


Finally on this (from http://perldoc.perl.org/perlre.html - scroll 
down to "Capture Buffers").

"Additionally, as of Perl 5.10.0 you may use named capture buffers 
and named backreferences. The notation is (?<name>...) to declare and 
\k<name> to reference. You may also use apostrophes instead of angle 
brackets to delimit the name; and you may use the bracketed \g{name} 
backreference syntax. It's possible to refer to a named capture 
buffer by absolute and relative number as well. Outside the pattern, 
a named capture buffer is available via the %+ hash. When different 
buffers within the same pattern have the same name, $+{name} and 
\k<name> refer to the leftmost defined group. (Thus it's possible to 
do things with named capture buffers that would otherwise require (??
{}) code to accomplish.)"


So, there is a differentiation between named captures and named 
backreferences.

(?<name>regex>) is a named capture. You cannot use the name of the 
capture within the regex or the replace (if search/replacing).

So, technically and being ever so slightly picky, the documentation 
is correct.

But really it is incomplete. I'll try and put some more examples in 
differentiating between named captures and named backreferences.




------------------------------------------------------------------------

[2008-12-19 10:05:34] rquadling@php.net

I'm not so sure.

Using RegexBuddy to explain the different Regexs ...

There seems to be no difference between the 2 forms.




(?<name>\w+): (?<digit>\d+)

Options: case insensitive; ^ and $ match at line breaks

Match the regular expression below and capture its match into 
backreference with name "name"  «(?<name>\w+) »
   Match a single character that is a "word character" (letters, 
digits, etc.)  «\w+ »
      Between one and unlimited times, as many times as possible, 
giving back as needed (greedy)  «+ »
Match the characters ": " literally  «:  »
Match the regular expression below and capture its match into 
backreference with name "digit"  «(?<digit>\d+) »
   Match a single digit 0..9  «\d+ »
      Between one and unlimited times, as many times as possible, 
giving back as needed (greedy)  «+ »







(?P<name>\w+): (?P<digit>\d+)

Options: case insensitive; ^ and $ match at line breaks

Match the regular expression below and capture its match into 
backreference with name "name"  «(?P<name>\w+) »
   Match a single character that is a "word character" (letters, 
digits, etc.)  «\w+ »
      Between one and unlimited times, as many times as possible, 
giving back as needed (greedy)  «+ »
Match the characters ": " literally  «:  »
Match the regular expression below and capture its match into 
backreference with name "digit"  «(?P<digit>\d+) »
   Match a single digit 0..9  «\d+ »
      Between one and unlimited times, as many times as possible, 
giving back as needed (greedy)  «+ »






------------------------------------------------------------------------

[2008-12-18 20:21:24] tobias382 at gmail dot com

Patch for /phpdoc/en/reference/pcre/functions/preg-match.xml:

278c278
< preg_match('/(?<name>\w+): (?<digit>\d+)/', $str, $matches);
---
> preg_match('/(?P<name>\w+): (?P<digit>\d+)/', $str, $matches);

------------------------------------------------------------------------

[2008-12-18 20:08:21] joe at digg dot com

Description:
------------
On http://us.php.net/preg_match example #4 (Using named subpattern) is

wrong. It shows:

<?php

$str = 'foobar: 2008';

preg_match('/(?<name>\w+): (?<digit>\d+)/', $str, $matches);

print_r($matches);

?>

The proper syntax for named expressions is (?P<foo>). 

Expected result:
----------------
<?php

$str = 'foobar: 2008';

preg_match('/(?P<name>\w+): (?P<digit>\d+)/', $str, $matches);

print_r($matches);

?>







------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=46904&edit=1


-- 
PHP Documentation Bugs Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic