[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xmlbeans-dev
Subject:    Re: xmlbeans xml security
From:       David Waite <mass () akuma ! org>
Date:       2004-06-30 23:58:02
Message-ID: 5240306E-CAF1-11D8-B0F7-000A95C89D86 () akuma ! org
[Download RAW message or body]

On Jun 30, 2004, at 4:59 PM, Eric Vasilik wrote:

> I need a better understanding of what canonicalization means.  But, I
> would think that one should be able to take an arbitrary store and
> persist it in a canonical manner.  If the state required to perform 
> this
> is not on the order of the size of the document being saved, then the
> saver should be able to be modified to perform this.  The advantage of
> this would be that the store would not be modified for the sake of
> generating a canonical form.

canonical form means (summarized from the w3c rec):
* attribute values are normalized, and defaults are specified.
* attributes in double quotes
* meaningless redeclarations of namespace/prefix are removed.
* character/parsed entity references and CDATA sections are expanded
* any xml declaration or dtd is removed
* empty elements are expanded to start/end tag pairs
* whitespace normalized
* formatted in UTF8
* line breaks are normalized to 0x0a

exclusive canonicalization is an alternate form, where

You can also use XPath to indicate only a particular node set is 
present in the canonicalized form, in document order.
* the tree between a ancestor and descendant node may be omitted in 
this sort of canonical form
* namespace declarations and xml: namespace attributes (xml:space, 
xml:lang) will be on an element in canonical form if they were declared 
on a ancestor node which is not in the node set.
* its possible that the canonical output is not well-formed xml, if 
there is not a common ancestor-or-self to all nodes within the node 
set.

as long as the returned document is in canonical form, 
re-canonicalizing it will yield a byte-identical document. The 
recommendation does not seem to dictate whether the canonical 
representation is a byte array or represented in some other loss-less 
manner.

"Exclusive Canonicalization" is a variant which changes these rules 
slightly
* xml-namespaced attributes (i.e. xml:space) are not redeclared if they 
were declared on an ancestor element omitted from the node set
* namespaces are only inherited from omitted ancestors if it is 
'visibly utilized', i.e. attributes or elements with qname-type content 
alone will not cause a namespace to be included.
* defines a xmlNamespacesPrefixList value to declare additional 
prefixes to assume are utilized on elements, for namespace inclusion 
from omitted ancestors.

> If we go this direction, then I recommend creating a new saver option
> and modify the saver to obey it.

I would agree that trying to maintain a canonical form within a 
representation is not a good idea, but there may be ways to improve the 
speed of canonicalization with the structure of the saver.

>
> Question, is a canonical form always a byte stream, or can a DOM be in
> canonical form?

Canonical form has a document or node set as input, and outputs a byte 
array or stream (to allow for feeding into the next transformation for 
xml-dsig).
Canonicalization of a node-set may yield a non-schema-valid or even 
non-well-formed document.

-David Waite


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic