[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-xml-sig
Subject:    [XML-SIG] A "tolerant" parser for structure-challenged HTML
From:       Alexandre.Fayolle () logilab ! fr (Alexandre Fayolle)
Date:       2001-07-20 16:09:13
Message-ID: Pine.LNX.4.21.0107201806440.3451-100000 () pisces ! logilab ! fr
[Download RAW message or body]

On Fri, 20 Jul 2001, Rich Salz wrote:

> Detlef Lannert wrote:
> > 
> > A couple of weeks ago I was faced with the problem of processing a few
> > web pages which were generated by Microsoft Word (and post-processed
> 
> You might want to look at the "microsoft demoroniser" :)
> 	http://www.fourmilab.ch/webtools/demoroniser/

You can also use Tidy which has a special mode for MS Word files. 
http://www.w3.org/People/Raggett/tidy/

Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic