[prev in list] [next in list] [prev in thread] [next in thread]
List: rpm-devel
Subject: Re: Anyone know of a tasteful LGPL HTML parser in C?
From: Jeff Johnson <n3npq () mac ! com>
Date: 2008-07-11 15:41:47
Message-ID: 716F7D00-AEFC-4AFC-913C-AD05BBC697B9 () mac ! com
[Download RAW message or body]
On Feb 9, 2008, at 1:26 PM, Ralf S. Engelschall wrote:
>
> Oh, sorry, I forgot to give you an example of the regex I'm thinking
> about (using PCRE functionality to make it easier, but can be
> changed to
> work with plain POSIX functionalities, too):
>
> (?i)<a(?:\s+[a-z][a-z0-9_]*(?:=(?:"[^"]*"|\S+))?)*?\s+href=(?:"([^"]
> *)"|(\S+))
>
> I've not tested in, so perhaps it is still buggy. But it should
> already
> give you an impression what I'm thinking about. A lot more complex it
> should not become...
>
I've finally gotten up to speed on parsing with RE's, the real reason
I needed to do pcregrep -> rpmgrep, because I've never done anything
meaningful with RE programming in C before now.
I have a "working" wild hack for directory recursion that I'm
integrating into rpmio.
The above PCRE pattern is sufficient to get an implementation wired
up, but a plain
old POSIX RE is likely needed for building without -lpcre.
If you (or anyone else) could write the POSIX version of the above,
I'd be grateful.
Perl RE's are another gap in my education ...
73 de Jeff
______________________________________________________________________
RPM Package Manager http://rpm5.org
Developer Communication List rpm-devel@rpm5.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic