From kfm-devel Thu Oct 18 04:30:16 2001 From: Michael Bedy Date: Thu, 18 Oct 2001 04:30:16 +0000 To: kfm-devel Subject: Re: Regular Expressions - PCRE wins, I think. X-MARC-Message: https://marc.info/?l=kfm-devel&m=100337954425954 On Thu, 18 Oct 2001, Dirk Mueller wrote: > On Mit, 17 Okt 2001, Michael Bedy wrote: > > > It looks to me like QRegEx gets very close, but misses a few crucial > > features. The biggest of these is lack of /m support and all the various > > things that go along with it. Also, the "." atom always matches newlines > > in QRegEx. This is equivelant to the /s option in Perl, but Javascript > > unfortunatly requires that "." not match newlines. > > > > In addition, the /g option is not directly supported in QRegEx, however > > it could be emulated with a little bit of work. > > As far as I know we don't support /g currently at all :-( (which breaks > google directory btw, so it needs urgent fixing!). > > do you think it would be possible without many hacks to add an translation > layer between javascript regexp and QRegexp? Although pcre is small, > dropping an additional dependency is nice. > Here's the problem. Given the following multi-line input: ------------------------------------------------------------- This is line one - 1. This is line two - 2. And this is line three - 3. ------------------------------------------------------------- Given regex: /(one.*)/m In Javascript(Perl) this should match: one - 1. But in QRegEx it would match (I think): one - 1. This is line two - 2. And this is line three - 3. Now we could hack it by adding a \n in the right place when we see a period (producing /(one.*\n)/) , but that gets us: one - 1. This is line two - 2. And even worse, if we do that that then the following regex would not work correctly: /(three.*)/m because we would stick a \n in there but there is no newline at the end of the last line. Using ^ and $ would produce incorrect results as well. There is also the problem of greedy/non-greedy quantifiers. In QRegEx you can only do greedy/non-greedy via the setMinimal() call, which effects the regex globally. In Javascript (as in Perl) you can individually specify greedy quantifiers with ?. (e.g., /ab+cd+?/ on the string abbbcddddd would match abbbcd, but /ab+cd+/ matches abbbcddddd) So the short answer is "not really." Not if we want to follow the spec. I don't know how many web sites use the multiline flag, but the greedy thing I can imagine being more of a problem. The Javascript regular expression syntax is basically indenitcal to the Perl syntax, minus a few options. So reading the "Information for Perl users" section in the QRegExp docs basically lists the problems. As it mentions, the person writing the regular expression could emulate the Perl functionality with QRegExp. When handed a regex which has already been written I don't think it's really feasable to "convert it" on the fly. Of course, the REALLY scary part is that I am actually starting to understand the ECMA regular expression spec :-) - Mike