[prev in list] [next in list] [prev in thread] [next in thread] 

List:       rpm-devel
Subject:    Re: Anyone know of a tasteful LGPL HTML parser in C?
From:       Jeff Johnson <n3npq () mac ! com>
Date:       2008-07-11 15:41:47
Message-ID: 716F7D00-AEFC-4AFC-913C-AD05BBC697B9 () mac ! com
[Download RAW message or body]


On Feb 9, 2008, at 1:26 PM, Ralf S. Engelschall wrote:

>
> Oh, sorry, I forgot to give you an example of the regex I'm thinking
> about (using PCRE functionality to make it easier, but can be  
> changed to
> work with plain POSIX functionalities, too):
>
> (?i)<a(?:\s+[a-z][a-z0-9_]*(?:=(?:"[^"]*"|\S+))?)*?\s+href=(?:"([^"] 
> *)"|(\S+))
>
> I've not tested in, so perhaps it is still buggy. But it should  
> already
> give you an impression what I'm thinking about. A lot more complex it
> should not become...
>

I've finally gotten up to speed on parsing with RE's, the real reason
I needed to do pcregrep -> rpmgrep, because I've never done anything
meaningful with RE programming in C before now.

I have a "working" wild hack for directory recursion that I'm  
integrating into rpmio.

The above PCRE pattern is sufficient to get an implementation wired  
up, but a plain
old POSIX RE is likely needed for building without -lpcre.

If you (or anyone else) could write the POSIX version of the above,  
I'd be grateful.

Perl RE's are another gap in my education ...

73 de Jeff
______________________________________________________________________
RPM Package Manager                                    http://rpm5.org
Developer Communication List                        rpm-devel@rpm5.org
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic