[prev in list] [next in list] [prev in thread] [next in thread]
List: openbsd-tech
Subject: Re: patterns.c question or possible bug
From: edgar () pettijohn-web ! com
Date: 2018-01-30 12:50:26
Message-ID: d2a6ee059e40762c () openbsd ! org
[Download RAW message or body]
On Jan 30, 2018 12:05 AM, Ori Bernstein <ori@eigenstate.org> wrote:
>
> On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn <edgar@pettijohn-web.com> \
> wrote:
> > I'm trying to use patterns.c for some pattern matching. The manual
> > mentions captures using "()" around what you want to capture. I don't
> > see how to get at the data though. Here is a sample program.
> >
> > #include <stdio.h>
> > #include "patterns.h"
> >
> > int
> > main(int argc, char *argv[])
> > {
> > const char *errstr = NULL;
> > const char *string = "the quick the brown the fox";
> > const char *pattern = "the";
> > int ret;
> > struct str_match match;
> >
> > ret = str_match(string, pattern, &match, &errstr);
> >
> > if (errstr != NULL)
> > printf("%s\n", errstr);
> > else
> > printf("number of matches %d\n", match.sm_nmatch);
> >
> > return 0;
> > }
> >
> > It prints 2 which I was expecting 3. I've tried multiple other patterns
> > and it seems the answer is always 2. Which leads me to believe I'm doing
> > something wrong. Any assistance appreciated.
> >
> >
> > Thanks,
> >
> >
> > Edgar
>
> The code is looking for a match of the pattern in the string, not all matches
> of the pattern in the string. It also makes the (IMO, surprising) decision
> that not having any capture groups in the pattern implies capturing the whole
> pattern. The whole string goes into the first match.
>
> So, in your case, you're matching:
>
> "the quick the brown the fox";
> ^^^
>
> Accordingly:
>
> matches.sm_match[0] = "the quick the brown the fox"
> matches.sm_match[1] = "the"
>
> If you had 'quick', you'd get similar behavior:
>
> "the quick the brown the fox";
> ^^^^
>
> Equivalently, putting the whole pattern in '()' will match the same thing:
>
> pattern = "(quick)"
>
> But multiple parens will match their substrings:
>
> pattern = "(qu)ick (the)"
>
> "the quick the brown the fox";
> ^^ ^^^
> matches.sm_match[0] = "the quick the brown the fox"
> matches.sm_match[1] = "qu"
> matches.sm_match[2] = "the"
>
> The choice to capture implicitly, I think, is confusing, but the behavior
> seems to me to be correct.
>
> --
> Ori Bernstein
Thanks. Makes sense now. Probably would have figured it out for myself if I'd have \
printed out matches.sm_match[0], etc. Live and learn.
Edgar
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic