[prev in list] [next in list] [prev in thread] [next in thread]
List: python-xml-sig
Subject: [XML-SIG] A "tolerant" parser for structure-challenged HTML
From: Alexandre.Fayolle () logilab ! fr (Alexandre Fayolle)
Date: 2001-07-20 16:09:13
Message-ID: Pine.LNX.4.21.0107201806440.3451-100000 () pisces ! logilab ! fr
[Download RAW message or body]
On Fri, 20 Jul 2001, Rich Salz wrote:
> Detlef Lannert wrote:
> >
> > A couple of weeks ago I was faced with the problem of processing a few
> > web pages which were generated by Microsoft Word (and post-processed
>
> You might want to look at the "microsoft demoroniser" :)
> http://www.fourmilab.ch/webtools/demoroniser/
You can also use Tidy which has a special mode for MS Word files.
http://www.w3.org/People/Raggett/tidy/
Alexandre Fayolle
--
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
Narval, the first software agent available as free software (GPL).
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic