[prev in list] [next in list] [prev in thread] [next in thread]
List: xml4lib
Subject: Re: [XML4Lib] batch conversion of HTML files to XML
From: Conal Tuohy <conal.tuohy () vuw ! ac ! nz>
Date: 2008-07-15 22:24:18
Message-ID: 1216160658.3670.13.camel () rb-501a-13-c
[Download RAW message or body]
Chiming in with one more option: JTidy (a Java version of Tidy)
http://jtidy.sourceforge.net/
On Tue, 2008-07-15 at 09:47 +0100, John Fitzgibbon wrote:
> Hi,
>
>
>
> Is it possible to convert a folder of HTML files to XML without having
> to edit each file with a text editor that supports regular
> expressions? In the past this is how I accomplished this task but I am
> hoping there is an easier way.
>
>
>
> The process would have to change tags like <br> to <br/>. Input tags
> in forms would also have to be closed.
>
>
>
> It may have to close tags like <p> and <li>.
>
>
>
> Finally, attribute values are not necessarily bounded by quotes. For
> example, width=200 will have to become width="200".
>
>
>
> Am I searching for a holy grail?
>
>
>
> Any advice would be much appreciated.
>
>
>
> Regards
>
> Jon
>
>
>
> w: www.galwaylibrary.ie
>
> e: info@galwaylibrary.ie
>
> p: 00 353 91 562471
>
> f: 00 353 91 565039
>
>
>
>
>
> ______________________________________________________________________
> This e-mail message has been scanned for Contentand cleared by
> MailMarshal Hosted at Galway County Council
> ______________________________________________________________________
> _______________________________________________
> XML4Lib mailing list
> XML4Lib@webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
--
Conal Tuohy
New Zealand Electronic Text Centre
www.nzetc.org
_______________________________________________
XML4Lib mailing list
XML4Lib@webjunction.org
http://lists.webjunction.org/mailman/listinfo/xml4lib
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic