[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl-xml
Subject:    Re: XML:LibXML parsing XHTML problem
From:       Petr Pajas <pajas () ufal ! ms ! mff ! cuni ! cz>
Date:       2004-09-18 9:50:39
Message-ID: 200409181150.48284.pajas () ufal ! ms ! mff ! cuni ! cz
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


On Saturday 18 September 2004 01:52, Ingo Weiss wrote:
> Thanks!
>
> I actually think I solved the problem. I used
>
> "$parser->parse_html_file" instead of "$parser->parse_file"
>
> and now it works as expected. I just wasn't aware that there is a
> specific method for parsing HTML documents.
>
> Ingo

If the file is XHTML, it's XML, and you should use $parser->parse_file,
since parse_html_file is only intended for HTML, not XHTML.

You said, the file parsed ok, you only had problems accessing nodes in the 
parsed document. If you're using XPath (via findnodes, find, or findvalue), 
you're probably making the very common but wrong assumption that expressions 
like /html/body match the body element in the XHTML namespace.
See e.g. this thread for more info and several examples on how to treat 
documents with a default namespace:

http://aspn.activestate.com/ASPN/Mail/Message/perl-xml/2144190

-- Petr

> > Hi Ingo,
> >
> > Can you provide some example code as well as a sample document that
> > isn't
> > parsing for you.  Without that, it's really hard to say what the
>
> problem
>
> > may
> > be.
> >
> > Steve Peters
> > steve@fisharerojo.org
> >
> > On Friday 17 September 2004 05:52 pm, Ingo Weiss wrote:
> > > Hi,
> > >
> > > I am trying to parse XHML files with XML:LibXML. I am getting no
> >
> > errors,
> >
> > > but then I can't access nodes or do anything else with the parsed
> > > document.
> > >
> > > It works fine if I remove the Document type declaration from the
> > > beginning and the namespace attribute from the "html" tag of the
>
> XHTML
>
> > > files.
> > >
> > > Do I need to tell the parser that these are XHTML files somehow for
>
> it
>
> > > to work?
> > >
> > > Thanks for any hint!
> > > Ingo
> > >
> > > _______________________________________________
> > > Perl-XML mailing list
> > > Perl-XML@listserv.ActiveState.com
> > > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
> _______________________________________________
> Perl-XML mailing list
> Perl-XML@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

[Attachment #5 (application/pgp-signature)]

_______________________________________________
Perl-XML mailing list
Perl-XML@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic