[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mozilla-documentation
Subject:    Re: Regarding my FAQ
From:       fta () oleane ! net (Fabien Tassin)
Date:       1998-05-04 15:07:47
[Download RAW message or body]

On 03 May 1998 16:15:02 +0200, Thomas Martin Widmann <viralbus@saratoga.daimi.aau.dk> \
wrote:
> 
> while (<>) {
> s/<a href="(.*?)">(.*?)<\/a>/$2 ($1)/gi;
> print;
> }
> 
> This will convert
> 	You can find Perl <A HREF="wherever">here</A>,
> to
> 	You can find Perl here (wherever),
> 
> After preprocessing, you could then use the usual HTML->text
> converter (or expand it to do all conversion on its own :).

This will not work if there are multiline tags, or some (useless) options..

What about this ?

$ perl -pe 'BEGIN { $i = 1 };
   undef $/;
   while (s|<a\s.*?href="(.*?)"[^>]*>(.*?)</a>|$2 \[$i\]|smoi) {
     $ref[$i++] = $1
   }
   END {
     print "--\nReferences:\n\n";
     for ($a = 1; $a <= $#ref; $a++) {
       print "[$a]: $ref[$a]\n";
     }
   }'  your_doc.html | your_own_html2text_translator

or in a more condensed/ugly form:

$ perl -pe 'BEGIN{$i=1}; undef $/; while (s|<a\s.*?href="(.*?)"[^>]*>(.*?)</a>|$2 \
\[$i\]|smoi) {$ref[$i++]=$1}; END {print "--\nReferences:\n\n"; for \
($a=1;$a<=$#ref;$a++){print "[$a]: $ref[$a]\n"; } }' your_doc.html

It can easily be extended to avoid duplicates and to give only full URLs..

-- 
Fabien Tassin -+- fta@oleane.net


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic