[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-j-dev
Subject:    Re: How to deal with EntityReferences in Document Fragments that are parsed
From:       Michael Glavassevich <mrglavas () ca ! ibm ! com>
Date:       2011-10-16 18:00:41
Message-ID: OFC5F1BACC.1A62F6F6-ON8525792B.00620950-8525792B.0062EFEA () ca ! ibm ! com
[Download RAW message or body]

Hi Frank,

I believe LSParser.parseWithContext() will do what you want. It should use
the context node to resolve entity references in the document fragment.

An implementation of parseWithContext() was developed this summer by one of
our GSoC students. It's still on my TODO list to integrate it and would
like to include it in a future release. Until then the solution you've come
up with might be your best option.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Frank Steimke" <f-steimke@berger-und-steimke.de> wrote on 10/16/2011
07:18:52 AM:

> Hi,
>
> i had a lot of troubles with Entity-References in Document Fragments. I
> found a solution at last, but I would like to know whether there is a
better
> approach.
> My scenario is like this: There is a Main Document which defines some
> internal Entities, but it does not use them. Say:
> <!DOCTYPE article [
> <!ENTIY foo "foo expanded" >
> ]>
> <article />
>
> There is a separate file with an xml Document Fragment. I would like to
> parse them as Fragments in Context of the Main Document. I makes use of
the
> Entity which is defined in the Main Document.  Say:
> <?xml version='1.0' standalone='no'?>
> <para>Example only: &foo;</para>
>
> First approach was to use DOM Level 3 parserInContext -- well, it is not
> supported by Xerces up to now. So I had to do something like "parse in
> context" by my own. I set up a DOM Document from the Main Document using
> xerces as LSParser, Then, I tried to parse the fragment and to generate
DOM
> Nodes which are to appended to the Main Document.
>
> I have tried SAX Parser for the fragment file. No way, because it
complains
> about the undeclared Entity. SAX knows nothing about the context.
>
> I tried StaX XMLStreamReader for parsing of the fragment file . The
> difference to SAX is the ability to set
> javax.xml.stream.isReplacingEntityReferences=false. Then, where getting
an
> EntityRefererence event, I generated an appropriate EntityReference from
the
> main Document and appended this as a child Node. E. g. (pseudo Code for
> clarification):
> EntityReference er=mainDocument.createEntityReference
> (name-from-parsed-fragment);
> mainDocument.appendChild(er);
> This works without any Error, but not as expected. Serializing the
> mainDocument shows the EntityReference empty (no value).
>
> Debugging the code, i ended up with the information, that the Entity
"foo"
> has a null value in the DocType of the main Document, because it is not
used
> there. Fortunately,  I found the DOM Level 3 normalizeDocument function,
> which says "This method acts as if the document was going through a save
and
> load cycle, putting the document in a "normal" form. As a consequence,
this
> method updates the replacement tree of EntityReference nodes ...".
However,
> it was of no use. After doing so, the foo-EntityReference still shows up
> without any value in the normalized then serialized Document.
>
> The only solution that I found is an ugly hack:
> - add a new Element to the main Document with a name that is hopefully
> unique, immediately after parsing;
> - Iteration over all the Entities that are defined in the main Document.
Add
> an EntiityReference to the newly created Element.
> - Serialize it to a ByteStream. Set DomConfig Parameter "entities" to
"true"
> keeps EntityReference nodes in the document, that means the reference
will
> be serialized as "&foo;".
> - Parse the content of the Bytestream gives us a new Document, which is
> essentially the same as the mainDocument. The difference is: there is an
> extra Element that has a Reference to each Entity, so that all Entities
are
> in use now
> - Remove the extra Element.
> From that moment on, the StAX parser functionality (described above)
works
> well. But this is a lot of work for a problem that sounds very simple. Is
> there a simpler solution which I haven't seen yet?
>
> Also, I wonder, if the upcoming  parserWithContext support for Apache
Xerces
> will help me in this situation. Since the Entity foo *IS* defined in my
> example, I would expect that adding an EntityReference within in Fragment
> that is parsed in Context will work as expected - whether the Entity has
> been in use in the Context or not.
>
> Thank you,
> Frank Steimke
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
[Attachment #3 (text/html)]

<html><body>
<p><tt>Hi Frank,</tt><br>
<br>
<tt>I believe LSParser.parseWithContext() will do what you want. It should use the \
context node to resolve entity references in the document fragment.</tt><br> <br>
<tt>An implementation of parseWithContext() was developed this summer by one of our \
GSoC students. It's still on my TODO list to integrate it and would like to include \
it in a future release. Until then the solution you've come up with might be your \
best option.</tt><br> <br>
<tt>Thanks.</tt><br>
<br>
<tt>Michael Glavassevich<br>
XML Parser Development<br>
IBM Toronto Lab<br>
E-mail: mrglavas@ca.ibm.com</tt><br>
<tt>E-mail: mrglavas@apache.org</tt><br>
<br>
<tt>&quot;Frank Steimke&quot; &lt;f-steimke@berger-und-steimke.de&gt; wrote on \
10/16/2011 07:18:52 AM:<br> <br>
&gt; Hi, <br>
&gt; <br>
&gt; i had a lot of troubles with Entity-References in Document Fragments. I<br>
&gt; found a solution at last, but I would like to know whether there is a better<br>
&gt; approach. <br>
&gt; My scenario is like this: There is a Main Document which defines some<br>
&gt; internal Entities, but it does not use them. Say:<br>
&gt; &lt;!DOCTYPE article [<br>
&gt; &lt;!ENTIY foo &quot;foo expanded&quot; &gt;<br>
&gt; ]&gt;<br>
&gt; &lt;article /&gt;<br>
&gt; <br>
&gt; There is a separate file with an xml Document Fragment. I would like to<br>
&gt; parse them as Fragments in Context of the Main Document. I makes use of the<br>
&gt; Entity which is defined in the Main Document. &nbsp;Say:<br>
&gt; &lt;?xml version='1.0' standalone='no'?&gt;<br>
&gt; &lt;para&gt;Example only: &amp;foo;&lt;/para&gt;<br>
&gt; <br>
&gt; First approach was to use DOM Level 3 parserInContext -- well, it is not<br>
&gt; supported by Xerces up to now. So I had to do something like &quot;parse in<br>
&gt; context&quot; by my own. I set up a DOM Document from the Main Document \
using<br> &gt; xerces as LSParser, Then, I tried to parse the fragment and to \
generate DOM<br> &gt; Nodes which are to appended to the Main Document. <br>
&gt; <br>
&gt; I have tried SAX Parser for the fragment file. No way, because it complains<br>
&gt; about the undeclared Entity. SAX knows nothing about the context.<br>
&gt; <br>
&gt; I tried StaX XMLStreamReader for parsing of the fragment file . The<br>
&gt; difference to SAX is the ability to set<br>
&gt; javax.xml.stream.isReplacingEntityReferences=false. Then, where getting an<br>
&gt; EntityRefererence event, I generated an appropriate EntityReference from the<br>
&gt; main Document and appended this as a child Node. E. g. (pseudo Code for<br>
&gt; clarification):<br>
&gt; EntityReference er=mainDocument.createEntityReference<br>
&gt; (name-from-parsed-fragment);<br>
&gt; mainDocument.appendChild(er);<br>
&gt; This works without any Error, but not as expected. Serializing the<br>
&gt; mainDocument shows the EntityReference empty (no value).<br>
&gt; <br>
&gt; Debugging the code, i ended up with the information, that the Entity \
&quot;foo&quot;<br> &gt; has a null value in the DocType of the main Document, \
because it is not used<br> &gt; there. Fortunately, &nbsp;I found the DOM Level 3 \
normalizeDocument function,<br> &gt; which says &quot;This method acts as if the \
document was going through a save and<br> &gt; load cycle, putting the document in a \
&quot;normal&quot; form. As a consequence, this<br> &gt; method updates the \
replacement tree of EntityReference nodes ...&quot;. However,<br> &gt; it was of no \
use. After doing so, the foo-EntityReference still shows up<br> &gt; without any \
value in the normalized then serialized Document.<br> &gt; <br>
&gt; The only solution that I found is an ugly hack:<br>
&gt; - add a new Element to the main Document with a name that is hopefully<br>
&gt; unique, immediately after parsing;<br>
&gt; - Iteration over all the Entities that are defined in the main Document. Add<br>
&gt; an EntiityReference to the newly created Element.<br>
&gt; - Serialize it to a ByteStream. Set DomConfig Parameter &quot;entities&quot; to \
&quot;true&quot;<br> &gt; keeps EntityReference nodes in the document, that means the \
reference will<br> &gt; be serialized as &quot;&amp;foo;&quot;.<br>
&gt; - Parse the content of the Bytestream gives us a new Document, which is<br>
&gt; essentially the same as the mainDocument. The difference is: there is an<br>
&gt; extra Element that has a Reference to each Entity, so that all Entities are<br>
&gt; in use now<br>
&gt; - Remove the extra Element.<br>
&gt; From that moment on, the StAX parser functionality (described above) works<br>
&gt; well. But this is a lot of work for a problem that sounds very simple. Is<br>
&gt; there a simpler solution which I haven't seen yet?<br>
&gt; <br>
&gt; Also, I wonder, if the upcoming &nbsp;parserWithContext support for Apache \
Xerces<br> &gt; will help me in this situation. Since the Entity foo *IS* defined in \
my<br> &gt; example, I would expect that adding an EntityReference within in \
Fragment<br> &gt; that is parsed in Context will work as expected - whether the \
Entity has<br> &gt; been in use in the Context or not.<br>
&gt; <br>
&gt; Thank you,<br>
&gt; Frank Steimke<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; ---------------------------------------------------------------------<br>
&gt; To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org<br>
&gt; For additional commands, e-mail: j-users-help@xerces.apache.org</tt><tt><br>
</tt></body></html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic