[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-j-dev
Subject:    Re: Way it ignore entity reference resolving?
From:       Michael Glavassevich <mrglavas () ca ! ibm ! com>
Date:       2012-03-30 22:47:13
Message-ID: OF03C58045.04B31454-ON852579D1.007BF77E-852579D1.007D4410 () ca ! ibm ! com
[Download RAW message or body]

--=_alternative 007D4410852579D1_=
Content-Type: text/plain; charset="US-ASCII"

HTML != XML. Try an HTML parser like NekoHTML [1].

Please note that you're not using Apache Xerces at all. 
com.sun.org.apache.* is Oracle's fork of the codebase. We have no 
influence over it.

Thanks.

[1] http://nekohtml.sourceforge.net/

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

laredotornado <laredotornado@gmail.com> wrote on 30/03/2012 05:16:13 PM:

> Hi,
> 
> I'm using Java 6 and the latest version of Xerces.  I'm trying to parse 
an
> HTML document that begins like this ...
> 
> <!DOCTYPE html>
> 
> and later references the entity "&raquo;".  Parsing dies with the 
exception
> ...
> 
> org.xml.sax.SAXParseException: The entity "raquo" was referenced, but 
not
> declared.
> at
> 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
> at
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse
> (DocumentBuilderImpl.java:284)
> at
> com.myco.myproject.util.XmlUtilities.getStringAsDocument
> (XmlUtilities.java:147)
> at
> 
com.myco.myproject.util.NetUtilities.getUrlAsDocument(NetUtilities.java:65)
> at
> com.myco.myproject.parsers.impl.AbstractMetromixParser.parsePage
> (AbstractMetromixParser.java:107)
> at
> com.myco.myproject.parsers.impl.AbstractMetromixParser.getEvents
> (AbstractMetromixParser.java:76)
> at com.myco.myproject.domain.EventFeed.refresh(EventFeed.java:81)
> at com.myco.myproject.domain.EventFeed.getEvents(EventFeed.java:72)
> at
> com.myco.myproject.parsers.impl.MetromixParserTest.testParser
> (MetromixParserTest.java:21)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall
> (FrameworkMethod.java:44)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run
> (ReflectiveCallable.java:15)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively
> (FrameworkMethod.java:41)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate
> (InvokeMethod.java:20)
> at
> 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at
> 
org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate

> (RunBeforeTestMethodCallbacks.java:74)
> at
> 
org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate

> (RunAfterTestMethodCallbacks.java:83)
> at
> org.springframework.test.context.junit4.statements.SpringRepeat.evaluate
> (SpringRepeat.java:72)
> at
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild
> (SpringJUnit4ClassRunner.java:231)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild
> (BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
> at
> 
org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate

> (RunBeforeTestClassCallbacks.java:61)
> at
> 
org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate

> (RunAfterTestClassCallbacks.java:71)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
> at
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run
> (SpringJUnit4ClassRunner.java:174)
> at
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run
> (JUnit4TestReference.java:50)
> at
> 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests
> (RemoteTestRunner.java:467)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests
> (RemoteTestRunner.java:683)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run
> (RemoteTestRunner.java:390)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main
> (RemoteTestRunner.java:197)
> 
> Is there any way to tell the parser to ignore these types of entities it
> cannot resolve?  If not, what resolver do I have to plugin?
> 
> Thanks, - Dave
> -- 
> View this message in context: http://old.nabble.com/Way-it-ignore-
> entity-reference-resolving--tp33544935p33544935.html
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

--=_alternative 007D4410852579D1_=
Content-Type: text/html; charset="US-ASCII"

<tt><font size=2>HTML != XML. Try an HTML parser like NekoHTML [1].</font></tt>
<br>
<br><tt><font size=2>Please note that you're not using Apache Xerces at
all. com.sun.org.apache.* is Oracle's fork of the codebase. We have no
influence over it.</font></tt>
<br>
<br><tt><font size=2>Thanks.</font></tt>
<br>
<br><tt><font size=2>[1] </font></tt><a \
href=http://nekohtml.sourceforge.net/><tt><font \
size=2>http://nekohtml.sourceforge.net/</font></tt></a> <br>
<br><tt><font size=2>Michael Glavassevich<br>
XML Technologies and WAS Development<br>
IBM Toronto Lab<br>
E-mail: mrglavas@ca.ibm.com</font></tt>
<br><tt><font size=2>E-mail: mrglavas@apache.org</font></tt>
<br>
<br><tt><font size=2>laredotornado &lt;laredotornado@gmail.com&gt; wrote
on 30/03/2012 05:16:13 PM:<br>
<br>
&gt; Hi,<br>
&gt; <br>
&gt; I'm using Java 6 and the latest version of Xerces. &nbsp;I'm trying
to parse an<br>
&gt; HTML document that begins like this ...<br>
&gt; <br>
&gt; &nbsp; &nbsp; &lt;!DOCTYPE html&gt;<br>
&gt; <br>
&gt; and later references the entity &quot;&amp;raquo;&quot;. &nbsp;Parsing
dies with the exception<br>
&gt; ...<br>
&gt; <br>
&gt; org.xml.sax.SAXParseException: The entity &quot;raquo&quot; was referenced,
but not<br>
&gt; declared.<br>
&gt; &nbsp; &nbsp;at<br>
&gt; com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)<br>
 &gt; &nbsp; &nbsp;at<br>
&gt; com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse<br>
&gt; (DocumentBuilderImpl.java:284)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; com.myco.myproject.util.XmlUtilities.getStringAsDocument<br>
&gt; (XmlUtilities.java:147)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; com.myco.myproject.util.NetUtilities.getUrlAsDocument(NetUtilities.java:65)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; com.myco.myproject.parsers.impl.AbstractMetromixParser.parsePage<br>
&gt; (AbstractMetromixParser.java:107)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; com.myco.myproject.parsers.impl.AbstractMetromixParser.getEvents<br>
&gt; (AbstractMetromixParser.java:76)<br>
&gt; &nbsp; &nbsp;at \
com.myco.myproject.domain.EventFeed.refresh(EventFeed.java:81)<br> &gt; &nbsp; \
&nbsp;at com.myco.myproject.domain.EventFeed.getEvents(EventFeed.java:72)<br> &gt; \
&nbsp; &nbsp;at<br> &gt; \
com.myco.myproject.parsers.impl.MetromixParserTest.testParser<br> &gt; \
(MetromixParserTest.java:21)<br> &gt; &nbsp; &nbsp;at \
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)<br>
 &gt; &nbsp; &nbsp;at<br>
&gt; sun.reflect.DelegatingMethodAccessorImpl.invoke<br>
&gt; (DelegatingMethodAccessorImpl.java:25)<br>
&gt; &nbsp; &nbsp;at java.lang.reflect.Method.invoke(Method.java:597)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.junit.runners.model.FrameworkMethod$1.runReflectiveCall<br>
&gt; (FrameworkMethod.java:44)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.junit.internal.runners.model.ReflectiveCallable.run<br>
&gt; (ReflectiveCallable.java:15)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.junit.runners.model.FrameworkMethod.invokeExplosively<br>
&gt; (FrameworkMethod.java:41)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.junit.internal.runners.statements.InvokeMethod.evaluate<br>
&gt; (InvokeMethod.java:20)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)<br>
 &gt; &nbsp; &nbsp;at<br>
&gt; org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate<br>
 &gt; (RunBeforeTestMethodCallbacks.java:74)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate<br>
 &gt; (RunAfterTestMethodCallbacks.java:83)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.springframework.test.context.junit4.statements.SpringRepeat.evaluate<br>
&gt; (SpringRepeat.java:72)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild<br>
&gt; (SpringJUnit4ClassRunner.java:231)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.junit.runners.BlockJUnit4ClassRunner.runChild<br>
&gt; (BlockJUnit4ClassRunner.java:50)<br>
&gt; &nbsp; &nbsp;at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)<br>
&gt; &nbsp; &nbsp;at \
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)<br> &gt; &nbsp; \
&nbsp;at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)<br> &gt; \
&nbsp; &nbsp;at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)<br> \
&gt; &nbsp; &nbsp;at \
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)<br> &gt; &nbsp; \
&nbsp;at<br> &gt; org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate<br>
 &gt; (RunBeforeTestClassCallbacks.java:61)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate<br>
 &gt; (RunAfterTestClassCallbacks.java:71)<br>
&gt; &nbsp; &nbsp;at org.junit.runners.ParentRunner.run(ParentRunner.java:236)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run<br>
&gt; (SpringJUnit4ClassRunner.java:174)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run<br>
&gt; (JUnit4TestReference.java:50)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)<br>
 &gt; &nbsp; &nbsp;at<br>
&gt; org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests<br>
&gt; (RemoteTestRunner.java:467)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests<br>
&gt; (RemoteTestRunner.java:683)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run<br>
&gt; (RemoteTestRunner.java:390)<br>
&gt; &nbsp; &nbsp;at<br>
&gt; org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main<br>
&gt; (RemoteTestRunner.java:197)<br>
&gt; <br>
&gt; Is there any way to tell the parser to ignore these types of entities
it<br>
&gt; cannot resolve? &nbsp;If not, what resolver do I have to plugin?<br>
&gt; <br>
&gt; Thanks, - Dave<br>
&gt; -- <br>
&gt; View this message in context: </font></tt><a \
href="http://old.nabble.com/Way-it-ignore-"><tt><font \
size=2>http://old.nabble.com/Way-it-ignore-</font></tt></a><tt><font size=2><br> &gt; \
entity-reference-resolving--tp33544935p33544935.html<br> &gt; Sent from the Xerces - \
J - Users mailing list archive at Nabble.com.<br> &gt; <br>
&gt; <br>
&gt; ---------------------------------------------------------------------<br>
&gt; To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org<br>
&gt; For additional commands, e-mail: j-users-help@xerces.apache.org<br>
</font></tt>
--=_alternative 007D4410852579D1_=--


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic