[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl-xml
Subject:    RE: :Simple - ignore HTML?
From:       Grant McLean <grantm () web ! co ! nz>
Date:       2001-03-19 5:13:06
[Download RAW message or body]

From: Morbus Iff [mailto:morbus@disobey.com]
> 
> I'm parsing a large number of XML documents, and very 
> infrequently, I run across some unencoded HTML within 
> an XML tag, like 
> "<description><b>this is 
> bold</b></description>".

Can I assume that:
1. <description> is one of your XML tags and
2. a <description> tag normally has plain text content but
3. some of your <description> tags contain HTML

If this is the case, I'm not sure how you expect XML::Simple
to know which tags it should parse as tags and which tags
it should parse as text.  Even if you switched to XML::Parser
you would have the same issue.

One approach you could take is to write a script to pre-process
your XML files and escape the contents of all <description> tags.
XML::Twig would be an excellent choice of module for this 
purpose.

Regards
Grant

=====================================================================
Grant McLean       | email: grantm@web.co.nz | Lvl 6, BP House
The Web Limited    | WWW:   www.web.co.nz    | 20 Customhouse Quay
Internet Solutions | Tel:   +64 4 495 8250   | Box 1195, Wellington
Awesome service    | Fax:   +64 4 495 8259   | New Zealand
_______________________________________________
Perl-XML mailing list
Perl-XML@listserv.ActiveState.com
http://listserv.ActiveState.com/mailman/listinfo/perl-xml

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic