[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openbsd-tech
Subject:    Re: patterns.c question or possible bug
From:       edgar () pettijohn-web ! com
Date:       2018-01-30 12:50:26
Message-ID: d2a6ee059e40762c () openbsd ! org
[Download RAW message or body]


On Jan 30, 2018 12:05 AM, Ori Bernstein <ori@eigenstate.org> wrote:
> 
> On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn <edgar@pettijohn-web.com> \
> wrote: 
> > I'm trying to use patterns.c for some pattern matching. The manual 
> > mentions captures using "()" around what you want to capture.  I don't 
> > see how to get at the data though.  Here is a sample program.
> > 
> > #include <stdio.h>
> > #include "patterns.h"
> > 
> > int
> > main(int argc, char *argv[])
> > {
> > const char        *errstr = NULL;
> > const char        *string = "the quick the brown the fox";
> > const char        *pattern = "the";
> > int            ret;
> > struct str_match     match;
> > 
> > ret = str_match(string, pattern, &match, &errstr);
> > 
> > if (errstr != NULL)
> > printf("%s\n", errstr);
> > else
> > printf("number of matches %d\n", match.sm_nmatch);
> > 
> > return 0;
> > }
> > 
> > It prints 2 which I was expecting 3. I've tried multiple other patterns 
> > and it seems the answer is always 2. Which leads me to believe I'm doing 
> > something wrong.  Any assistance appreciated.
> > 
> > 
> > Thanks,
> > 
> > 
> > Edgar
> 
> The code is looking for a match of the pattern in the string, not all matches
> of the pattern in the string. It also makes the (IMO, surprising) decision
> that not having any capture groups in the pattern implies capturing the whole
> pattern. The whole string goes into the first match.
> 
> So, in your case, you're matching:
> 
> "the quick the brown the fox";
> ^^^
> 
> Accordingly:
> 
> matches.sm_match[0] = "the quick the brown the fox"
> matches.sm_match[1] = "the"
> 
> If you had 'quick', you'd get similar behavior:
> 
> "the quick the brown the fox";
> ^^^^
> 
> Equivalently, putting the whole pattern in '()' will match the same thing:
> 
> pattern = "(quick)"
> 
> But multiple parens will match their substrings:
> 
> pattern = "(qu)ick (the)"
> 
> "the quick the brown the fox";
> ^^    ^^^
> matches.sm_match[0] = "the quick the brown the fox"
> matches.sm_match[1] = "qu"
> matches.sm_match[2] = "the"
> 
> The choice to capture implicitly, I think, is confusing, but the behavior
> seems to me to be correct.
> 
> -- 
> Ori Bernstein

Thanks. Makes sense now. Probably would have figured it out for myself if I'd have \
printed out matches.sm_match[0], etc. Live and learn.

Edgar


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic