[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-j-user
Subject:    RE: AW: unparsed data in XML
From:       "WATKIN-JONES,ADAM (HP-UnitedKingdom,ex1)" <adam_watkin-jones () hp ! com>
Date:       2002-04-29 17:00:59
[Download RAW message or body]

Hi!

From the horse's mouth, as it were:

http://www.w3.org/TR/2000/REC-xml-20001006#charsets

2.2 Characters
[Definition: A parsed entity contains text, a sequence of characters, which
may represent markup or character data.] [Definition: A character is an
atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646] (see also
[ISO/IEC 10646-2000]). Legal characters are tab, carriage return, line feed,
and the legal characters of Unicode and ISO/IEC 10646. The versions of these
standards cited in A.1 Normative References were current at the time this
document was prepared. New characters may be added to these standards by
amendments or new editions. Consequently, XML processors must accept any
character in the range specified for Char. The use of "compatibility
characters", as defined in section 6.8 of [Unicode] (see also D21 in section
3.6 of [Unicode3]), is discouraged.]

Character Range
[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks,
FFFE, and FFFF. */ 

The mechanism for encoding character code points into bit patterns may vary
from entity to entity. All XML processors must accept the UTF-8 and UTF-16
encodings of 10646; the mechanisms for signaling which of the two is in use,
or for bringing other encodings into play, are discussed later, in 4.3.3
Character Encoding in Entities.


HTH
Adam


-----Original Message-----
From: Cliff Rowley

<snip/>

Oh I see, I mean I realise that.  I was just wondering if 'especially 
control characters' implies that there other characters aside from control 
characters that are not allowed in an XML document.

<snip/>

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic