[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ruby-talk
Subject:    Re: HTML parsing
From:       Emmanuel Touzery <emmanuel.touzery () wanadoo ! fr>
Date:       2004-02-02 12:48:00
Message-ID: 401E478C.1060206 () wanadoo ! fr
[Download RAW message or body]

Gavin Sinclair wrote:

>Hi folks,
>
>I need to parse some HTML.  I've dug around the archives and so on and
>found the best solution to be Ned Konz's 'ruby-htmltools', which
>relies on 'html-parser'.  Both of these projects are not really
>maintained, so I'm wondering what other people currently use.
>  
>
i was using a home-made solution, but i just decided (this WE) to 
convert it to REXML: I would use HTML tidy (which is already needed for 
~60% of the pages i'm parsing now), and ask tidy to spit out XHTML. i 
think that's the best (with my home made solution, besides the 
duplication of work of parsing HTML, i needed a list of tags that you 
don't need to close etc. in XHTML all is done for me.. and then i get 
the familiar API of REXML [even though i never used REXML yet :O) ]).

i think it's the best.

emmanuel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic