[prev in list] [next in list] [prev in thread] [next in thread]
List: perl-xml
Subject: RE: :Simple - ignore HTML?
From: Grant McLean <grantm () web ! co ! nz>
Date: 2001-03-19 5:13:06
[Download RAW message or body]
From: Morbus Iff [mailto:morbus@disobey.com]
>
> I'm parsing a large number of XML documents, and very
> infrequently, I run across some unencoded HTML within
> an XML tag, like
> "<description><b>this is
> bold</b></description>".
Can I assume that:
1. <description> is one of your XML tags and
2. a <description> tag normally has plain text content but
3. some of your <description> tags contain HTML
If this is the case, I'm not sure how you expect XML::Simple
to know which tags it should parse as tags and which tags
it should parse as text. Even if you switched to XML::Parser
you would have the same issue.
One approach you could take is to write a script to pre-process
your XML files and escape the contents of all <description> tags.
XML::Twig would be an excellent choice of module for this
purpose.
Regards
Grant
=====================================================================
Grant McLean | email: grantm@web.co.nz | Lvl 6, BP House
The Web Limited | WWW: www.web.co.nz | 20 Customhouse Quay
Internet Solutions | Tel: +64 4 495 8250 | Box 1195, Wellington
Awesome service | Fax: +64 4 495 8259 | New Zealand
_______________________________________________
Perl-XML mailing list
Perl-XML@listserv.ActiveState.com
http://listserv.ActiveState.com/mailman/listinfo/perl-xml
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic