[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl-xml
Subject:    Re: New to xml, need to understand namespaces
From:       Grant McLean <grant () mclean ! net ! nz>
Date:       2003-11-12 17:44:25
[Download RAW message or body]

Sorry, I forgot to send this reply to the list.

Gary Nielson wrote:
 > I can program in perl, but am new to object-oriented programming
 > in perl and to xml.

You may find the Perl-XML FAQ useful:

   http://perl-xml.sourceforge.net/faq/

 > I need to parse the National Weather Service's
 > experimental new xml alert file. But from my reading, it uses
 > namespaces. Right now, I am trying to learn XML::Parser

Don't do that.

If you learn XML::Parser then your code can use XML::Parser.

If you learn SAX then you can use any SAX parser (eg: one based on
libxml, or expat, or pure Perl) or even non-XML sources such as DBI.

If you learn DOM/XPath then your code can use XML::LibXML or XML::XPath
(with minor changes).

In short, don't use the XML::Parser API directly for any new
development.

 > but how to you get it so you can call, say, title or category
 > without using cap:title or cap:category?

As Forest pointed out, the XML::Filter::Namespace module can be used
to strip out any elements that do not match the desired namespace.
Your code can then assume any element it encounters is in the namespace
you're interested in.  However the element names will still have the
namespace prefix on them.

Here's a sample script that prints the text content of every <headline>
element:

   #!/usr/bin/perl -w

   use strict;

   package YourFilter;  # ideally put this in its own .pm file

   use base 'XML::SAX::Base';

   sub characters {
     my($self, $data) = @_;

     $self->{text} = $data->{Data};  # assumes XML::Filter::BufferText
   }

   sub end_element {
     my($self, $data) = @_;

     return unless($data->{LocalName} eq 'headline'); # Ignore NS prefix
     print "Headline: $self->{text}\n";
   }


   package main;    # main script starts here

   use XML::SAX::Machines qw( Pipeline );
   use XML::Filter::Namespace;

   my $p = Pipeline(
     XML::Filter::Namespace->new(
       ns => 'http://www.incident.com/cap/0.9a1'
     ) =>
     XML::Filter::BufferText =>
     YourFilter->new()
   );

   $p->parse_uri(shift);   # expects filename on command-line


This demonstrates an advantage of SAX - you can pick up off-the-shelf
modules like XML::Filter::Namespace or XML::Filter::BufferText or
XML::SAX::Writer; mix in your own modules; and string them together
with XML::SAX::Machines.

It also demonstrates a disadvantage of SAX.  Like the XML::Parser
handler API, your code is structured around events rather than the
data.  So for example you don't know whether you have all the text
content until you get the end_element event, but by that stage the
characters events have already happened.


Assuming the CAP documents are not huge, you may find it simpler
to use XML::LibXML and index into the elements you're interested
in using XPath expressions.

Unfortunately, namespaces and XPath can get a bit messy, but you could
use a hybrid approach to strip off the namespace via SAX and use
XML::LibXML::SAX::Builder at the end of the pipeline to build a DOM.

Regards
Grant


_______________________________________________
Perl-XML mailing list
Perl-XML@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic