[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xml-dev
Subject:    Re: [xml-dev] What to escape when serializing XML
From:       richard () inf ! ed ! ac ! uk (Richard Tobin)
Date:       2007-01-03 12:22:44
Message-ID: 20070103122244.E751A188C4F () macpro ! inf ! ed ! ac ! uk
[Download RAW message or body]

In article <200701021413.20468.frans.englich@telia.com> you write:

>These paragraphs gives good hints to the complexity in this, but it's
>not very exact("Specifically, CR, NEL ...").

I'm not sure what you find inexact about it.  It lists the three
characters that must be escaped in text to avoid their being
normalised when re-read, and the five that must be escaped in
attributes for the same reason.

If you're serialising as XML 1.0 you don't need to bother escaping NEL
and LSEP (because they don't get normalised when read).  But as the
text you quoted notes, a 1.0 external entity included in a 1.1
document is parsed as XML 1.1, so if your output might be used as an
external entity in that way - rather than as a complete XML document -
you will need to escape them.  You might as well escape them anyway.

I'll try to summarise:

1-1F except CR, TAB, NL:
Can't occur in XML 1.0.  Can occur in XML 1.1 and must be escaped.

CR:
Always escape.

NL, TAB:
Escape in attribute values.

NEL, LSEP:
Always escape (only essential if serialising as XML 1.1).

7F-9F except NEL:
Always escape (only essential if serialising as XML 1.1).

less-than, ampersand:
Always escape.

greater-than:
Escape in text if it immediately follows two close-square-brackets, as
that sequence is only allowed as the end of a CDATA marked section.

single-quote, double-quote:
Escape in attribute values quoted with the same kind of quote.

I think it's safe to always escape all of these, but always escaping
NL would make things unreadable.

-- Richard

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic