[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ruby-talk
Subject:    HTML Parsing?
From:       Martin Hart <martin () zsdfherg ! com>
Date:       2004-02-05 17:24:24
Message-ID: 200402051715.24363.martin () zsdfherg ! com
[Download RAW message or body]


Hi all,

I need to access an http server and interpret som data from the page i get 
back (basically for some minimal tests of a website).  I know that I can use 
the Net::HTTP class to connect and retrieve the page, but then I am left with 
a string full of stuff.

What do people use to parse this into something useful?  Is REXML an option 
(although the html is not likely to be valid xml)?  I have looked at the 
html-parser on RAA but do not seem to be able to individually access the 
components of the returned page (for example I need to see what the contents 
of a text control are - or what the caption of the <h2> tag is.

I suppose using regexps is an option as well, but just wondering if I am 
missing some cool library that already does all this stuff?

Thanks for any advice

Martin

-- 
Martin Hart
Arnclan Limited
53 Union Street
Dunstable, Beds
LU6 1EX
http://www.arnclanit.com



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic