[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ruby-talk
Subject:    Re: HTML Parsing?
From:       Gavin Sinclair <gsinclair () soyabean ! com ! au>
Date:       2004-02-05 21:02:35
Message-ID: 1842103020413.20040206080138 () soyabean ! com ! au
[Download RAW message or body]

On Friday, February 6, 2004, 5:39:15 AM, Dave wrote:


> Martin Hart wrote:
>> What do people use to parse this into something useful?  Is REXML an option
>> (although the html is not likely to be valid xml)?  I have looked at the
>> html-parser on RAA but do not seem to be able to individually access the
>> components of the returned page (for example I need to see what the contents
>> of a text control are - or what the caption of the <h2> tag is.

> see http://ruby-htmltools.rubyforge.org/

> I used this library about a year ago, and found it pretty buggy.

For the OP: you can use the above library to convert HTML into a
REXML::Document, then pull it apart as you please.

Gavin


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic