[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-j-user
Subject:    RE: Validation error in xerces -j 2.6.2 and 2.7.0 for schema types xsd:anyURI
From:       Michael Glavassevich <mrglavas () ca ! ibm ! com>
Date:       2005-07-18 3:02:57
Message-ID: OF630D4E23.1AE2C45D-ON85257042.000C4527-85257042.0010BE89 () ca ! ibm ! com
[Download RAW message or body]

Literal space characters are allowed by the anyURI type. I imagine one 
reason their use is discouraged is because (some of) these characters may 
be removed from the value during white space normalization [1]. It's 
possible to use spaces in anyURI values without encoding them. You just 
have to be careful.

Xerces 2.5.0's behaviour was incorrect. It completely ignored the 
following section of the spec:

3.2.17.1 Lexical representation
The ·lexical space· of anyURI is finite-length character sequences 
which, when the algorithm defined in Section 5.4 of [XML Linking Language]
is applied to them, result in strings which are legal URIs according to 
[RFC 2396], as amended by [RFC 2732].

rejecting all lexical values which are not legal URIs before the escaping 
algorithm is applied (such as ones containing spaces).

Thanks.

[1] http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace

"Natarajan Ravi" <Ravi.Natarajan@cfh.nhs.uk> wrote on 07/15/2005 06:13:25 
AM:

> Is there an update to this issue or I have to force encoding of spaces.
> 
> -----Original Message-----
> From: Rancier, Jeff [mailto:Jeff.Rancier@Sensis.com] 
> Sent: 07 July 2005 17:52
> To: j-users@xerces.apache.org
> Subject: RE: Validation error in xerces -j 2.6.2 and 2.7.0 for 
> schema types xsd:anyURI

> I see.  I just saw the following note:
> 
> 3.2.17.1 Lexical representation
> The ·lexical space· of anyURI is finite-length character sequences 
> which, when the algorithm defined in Section 5.4 of [XML Linking 
Language]
> is applied to them, result in strings which are legal URIs according to 
> [RFC 2396], as amended by [RFC 2732]. 
> NOTE: Spaces are, in principle, allowed in the ·lexical space· of anyURI
> , however, their use is highly discouraged (unless they are encoded by 
%20). 
> 
> 
> 
> -----Original Message-----
> From: Natarajan Ravi [mailto:Ravi.Natarajan@cfh.nhs.uk] 
> Sent: Tuesday, July 05, 2005 11:58 AM
> To: j-users@xerces.apache.org
> Subject: RE: Validation error in xerces -j 2.6.2 and 2.7.0 for 
> schema types xsd:anyURI

>     If that's the case one would expect warnings while validation 
> process indicating the presence of spaces which are not encoded . 
> Xerces 2.5.0 reports on these situations and this is not experienced
> in 2.6.2 and higher versions.
> 
> 
> NOTE: Spaces are, in principle, allowed in the ·lexical space· of anyURI
> , however, their use is highly discouraged (unless they are encoded by 
%20). 
> 
> -----Original Message-----
> From: Rancier, Jeff [mailto:Jeff.Rancier@Sensis.com] 
> Sent: 05 July 2005 16:29
> To: j-users@xerces.apache.org
> Subject: RE: Validation error in xerces -j 2.6.2 and 2.7.0 for 
> schema types xsd:anyURI

> http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#anyURI 
> 3.2.17.1 Lexical representation
> The ·lexical space· of anyURI is finite-length character sequences 
> which, when the algorithm defined in Section 5.4 of [XML Linking 
Language]
> is applied to them, result in strings which are legal URIs according to 
> [RFC 2396], as amended by [RFC 2732]. 
> NOTE: Spaces are, in principle, allowed in the ·lexical space· of anyURI
> , however, their use is highly discouraged (unless they are encoded by 
%20). 
> -----Original Message-----
> From: Natarajan Ravi [mailto:Ravi.Natarajan@cfh.nhs.uk] 
> Sent: Tuesday, July 05, 2005 7:36 AM
> To: j-users@xerces.apache.org
> Subject: FW: Validation error in xerces -j 2.6.2 and 2.7.0 for 
> schema types xsd:anyURI

> I trying to validate a simple xml file against a given schema. The 
> xml file and the schema are shown below. Both xerces 2.6.2 and 2.7.0
> are report that the xml file is valid against the schema. where as 
> 2.5.0 reports errors on the invalid uri value marked bold.
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <Sample xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:
> noNamespaceSchemaLocation="C:\Download\Test\Sample.xsd">
> <tel value="fax:123%20124"/>
> <tel value="tel:123 124"/>
> <tel value="tel:123%20124"/>
> <tel value="tel:123124%2012312"/>
> </Sample>
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
elementFormDefault="
> qualified" attributeFormDefault="unqualified">
> <xs:element name="Sample" type="ElementType">
> <xs:annotation>
> <xs:documentation>Comment describing your root 
element</xs:documentation>
> </xs:annotation>
> </xs:element>
> <xs:complexType name="ElementType">
> <xs:sequence>
> <xs:element name="tel" type="TelType" maxOccurs="unbounded"/>
> </xs:sequence>
> </xs:complexType>
> <xs:complexType name="TelType">
> <xs:attribute name="value" type="xs:anyURI"/>
> </xs:complexType>
> </xs:schema>
>     It looks like that the latest versions of xerces are having 
> problems in validating xsd:anyURI types. 
> 
> This e-mail is confidential and privileged. If you are not the 
> intended recipient please accept our apologies; please do not 
> disclose, copy or distribute information in this e-mail or take any 
> action in reliance on its contents: to do so is strictly prohibited 
> and may be unlawful. Please inform us that this message has gone 
> astray before deleting it. Thank you for your co-operation.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic