[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xml4lib
Subject:    Re: [XML4Lib] batch conversion of HTML files to XML
From:       Conal Tuohy <conal.tuohy () vuw ! ac ! nz>
Date:       2008-07-15 22:24:18
Message-ID: 1216160658.3670.13.camel () rb-501a-13-c
[Download RAW message or body]

Chiming in with one more option: JTidy (a Java version of Tidy)

http://jtidy.sourceforge.net/

On Tue, 2008-07-15 at 09:47 +0100, John Fitzgibbon wrote:
> Hi,
> 
>  
> 
> Is it possible to convert a folder of HTML files to XML without having
> to edit each file with a text editor that supports regular
> expressions? In the past this is how I accomplished this task but I am
> hoping there is an easier way.
> 
>  
> 
> The process would have to change tags like <br> to <br/>. Input tags
> in forms would also have to be closed.
> 
>  
> 
> It may have to close tags like <p> and <li>.
> 
>  
> 
> Finally, attribute values are not necessarily bounded by quotes. For
> example, width=200 will have to become width="200".
> 
>  
> 
> Am I searching for a holy grail?
> 
>  
> 
> Any advice would be much appreciated.
> 
>  
> 
> Regards
> 
> Jon
> 
>  
> 
> w: www.galwaylibrary.ie
> 
> e: info@galwaylibrary.ie
> 
> p: 00 353 91 562471
> 
> f: 00 353 91 565039
> 
>  
> 
> 
> 
> ______________________________________________________________________
> This e-mail message has been scanned for Contentand cleared by
> MailMarshal Hosted at Galway County Council 
> ______________________________________________________________________
> _______________________________________________
> XML4Lib mailing list
> XML4Lib@webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
-- 
Conal Tuohy
New Zealand Electronic Text Centre
www.nzetc.org



_______________________________________________
XML4Lib mailing list
XML4Lib@webjunction.org
http://lists.webjunction.org/mailman/listinfo/xml4lib


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic