[prev in list] [next in list] [prev in thread] [next in thread]
List: xmlbeans-dev
Subject: Re: xmlbeans xml security
From: David Waite <mass () akuma ! org>
Date: 2004-06-30 23:58:02
Message-ID: 5240306E-CAF1-11D8-B0F7-000A95C89D86 () akuma ! org
[Download RAW message or body]
On Jun 30, 2004, at 4:59 PM, Eric Vasilik wrote:
> I need a better understanding of what canonicalization means. But, I
> would think that one should be able to take an arbitrary store and
> persist it in a canonical manner. If the state required to perform
> this
> is not on the order of the size of the document being saved, then the
> saver should be able to be modified to perform this. The advantage of
> this would be that the store would not be modified for the sake of
> generating a canonical form.
canonical form means (summarized from the w3c rec):
* attribute values are normalized, and defaults are specified.
* attributes in double quotes
* meaningless redeclarations of namespace/prefix are removed.
* character/parsed entity references and CDATA sections are expanded
* any xml declaration or dtd is removed
* empty elements are expanded to start/end tag pairs
* whitespace normalized
* formatted in UTF8
* line breaks are normalized to 0x0a
exclusive canonicalization is an alternate form, where
You can also use XPath to indicate only a particular node set is
present in the canonicalized form, in document order.
* the tree between a ancestor and descendant node may be omitted in
this sort of canonical form
* namespace declarations and xml: namespace attributes (xml:space,
xml:lang) will be on an element in canonical form if they were declared
on a ancestor node which is not in the node set.
* its possible that the canonical output is not well-formed xml, if
there is not a common ancestor-or-self to all nodes within the node
set.
as long as the returned document is in canonical form,
re-canonicalizing it will yield a byte-identical document. The
recommendation does not seem to dictate whether the canonical
representation is a byte array or represented in some other loss-less
manner.
"Exclusive Canonicalization" is a variant which changes these rules
slightly
* xml-namespaced attributes (i.e. xml:space) are not redeclared if they
were declared on an ancestor element omitted from the node set
* namespaces are only inherited from omitted ancestors if it is
'visibly utilized', i.e. attributes or elements with qname-type content
alone will not cause a namespace to be included.
* defines a xmlNamespacesPrefixList value to declare additional
prefixes to assume are utilized on elements, for namespace inclusion
from omitted ancestors.
> If we go this direction, then I recommend creating a new saver option
> and modify the saver to obey it.
I would agree that trying to maintain a canonical form within a
representation is not a good idea, but there may be ways to improve the
speed of canonicalization with the structure of the saver.
>
> Question, is a canonical form always a byte stream, or can a DOM be in
> canonical form?
Canonical form has a document or node set as input, and outputs a byte
array or stream (to allow for feeding into the next transformation for
xml-dsig).
Canonicalization of a node-set may yield a non-schema-valid or even
non-well-formed document.
-David Waite
- ---------------------------------------------------------------------
To unsubscribe, e-mail: xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic