[prev in list] [next in list] [prev in thread] [next in thread]
List: xml-apache-general
Subject: RE: Looking for tools/ideas for filtering HTML
From: <max () corrosive ! co ! uk>
Date: 2001-11-19 9:55:02
[Download RAW message or body]
and this:
http://www.scrml.org
>You can take a look at some projects like:
>* JavaCC HTML Parser (http://www.quiotix.com/downloads/html-parser/)
>* HEX - The HTML Enabled XML Parser
>(http://www-uk.hpl.hp.com/people/sth/java/hex.html)
>
>Rgds,
>Neeme
>
>-----Original Message-----
>From: Jaquiss, Robert [mailto:RJaquiss@nfb.org]
>Sent: Friday, November 16, 2001 10:44 PM
>To: general@xml.apache.org
>Subject: Looking for tools/ideas for filtering HTML
>
>Hello:
>
> I have just joined this list, and am also a beginning Java programmer.
>I appologize if this is not a suitable question for this list. I need to
>write a filter for HTML pages. My goal is to read an HTML page, throwing
>away all the HTML code and just keeping a block of text that occurs near the
>bottom of the page. The HTML tags are liable to be unbalanced. There will be
>a <P> but no </P>. I found a sample program that used the SAXparser, but it
>SAXparser doesn't seem to handle unbalanced tags. Ideas/comments would be
>appreciated. Thank you.
>
> Regards
> Robert Jaquiss
>
>
>---------------------------------------------------------------------
>In case of troubles, e-mail: webmaster@xml.apache.org
>To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>For additional commands, e-mail: general-help@xml.apache.org
--
------------------------------
Max Guglielmino
Corrosive
http://www.corrosive.co.uk
---------------------------------------------------------------------
In case of troubles, e-mail: webmaster@xml.apache.org
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic