[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: Regular Expressions - PCRE wins, I think.
From:       Michael Bedy <mjbedy () mediaone ! net>
Date:       2001-10-18 4:30:16
[Download RAW message or body]


On Thu, 18 Oct 2001, Dirk Mueller wrote:

> On Mit, 17 Okt 2001, Michael Bedy wrote:
>
> >   It looks to me like QRegEx gets very close, but misses a few crucial
> > features. The biggest of these is lack of /m support and all the various
> > things that go along with it. Also, the "." atom always matches newlines
> > in QRegEx. This is equivelant to the /s option in Perl, but Javascript
> > unfortunatly requires that "." not match newlines.
> >
> >   In addition, the /g option is not directly supported in QRegEx, however
> > it could be emulated with a little bit of work.
>
> As far as I know we don't support /g currently at all :-( (which breaks
> google directory btw, so it needs urgent fixing!).
>
> do you think it would be possible without many hacks to add an translation
> layer between javascript regexp and QRegexp? Although pcre is small,
> dropping an additional dependency is nice.
>

  Here's the problem. Given the following multi-line input:

-------------------------------------------------------------
This is line one - 1.
This is line two - 2.
And this is line three - 3.
-------------------------------------------------------------

Given regex: /(one.*)/m

  In Javascript(Perl) this should match:

one - 1.

  But in QRegEx it would match (I think):

one - 1.
This is line two - 2.
And this is line three - 3.


  Now we could hack it by adding a \n in the right place when we see a
period (producing /(one.*\n)/) , but that gets us:

one - 1.
This is line two - 2.

  And even worse, if we do that that then the following regex would not
work correctly:

   /(three.*)/m

  because we would stick a \n in there but there is no newline at the end
of the last line. Using ^ and $ would produce incorrect results as well.


  There is also the problem of greedy/non-greedy quantifiers. In QRegEx
you can only do greedy/non-greedy via the setMinimal() call, which effects
the regex globally. In Javascript (as in Perl) you can individually
specify greedy quantifiers with ?. (e.g., /ab+cd+?/ on the string
abbbcddddd would match abbbcd, but /ab+cd+/ matches  abbbcddddd)


  So the short answer is "not really." Not if we want to follow the spec.
I don't know how many web sites use the multiline flag, but the greedy
thing I can imagine being more of a problem. The Javascript regular
expression syntax is basically indenitcal to the Perl syntax, minus a few
options. So reading the "Information for Perl users" section in the
QRegExp docs basically lists the problems. As it mentions, the person
writing the regular expression could emulate the Perl functionality with
QRegExp. When handed a regex which has already been written I don't think
it's really feasable to "convert it" on the fly.

  Of course, the REALLY scary part is that I am actually starting to
understand the ECMA regular expression spec :-)


     - Mike

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic