[prev in list] [next in list] [prev in thread] [next in thread]
List: xml-dev
Subject: Re: [xml-dev] Many different syntaxes in XML - is that good language design?
From: Norman Gray <norman.gray () glasgow ! ac ! uk>
Date: 2022-03-07 16:57:16
Message-ID: 42D561E6-1002-4CF7-BB7B-917A11C5EB74 () glasgow ! ac ! uk
[Download RAW message or body]
Pete, hello.
On 7 Mar 2022, at 15:52, Pete Cordell wrote:
> Viewed like that it seems a fairly minimal and efficient syntax.
Indeed. SGML is a thing of some beauty, viewed through the right (rather=
special) spectacles.
> (It does make me wonder why the CDATA section 'directive' wasn't just =
<!CDATA[...]>. Even more curious, given all the SGML things that got dro=
pped, is how it got included in XML. It creates just as many problems as=
it solves.)
It's certainly pretty orthogonal, in all sorts of directions.
Regarding <![CDATA[...]]> vs <!CDATA[ ... ]>, the sequence of tokens here=
*in SGML* is
<! : markup declaration open
[ : declaration subset open
CDATA : status-keyword
[ : dso again
...data
]] : marked section close
> : markup declaration close (which happens to be the same character, =
by default, as element start-tag-close, and a few others)
I'm not 100% clear why the 'declaration subset open' is so-called. This =
token is also used to introduce the DTD declaration, full or partial, at =
the very top of a document (which is written in the 'other' syntax, in th=
e terms of this thread), and it seems to have been reused here _partly_ a=
s a sort-of gesture towards the declaration language of DTDs -- ie, the f=
irst '[' is effectively signalling an escape inside the escape, in a diff=
erent direction. The SGML status keywords, alongside CDATA, were/are INC=
LUDE, IGNORE (which includes and ignores the text inside the construct), =
RCDATA (which is like CDATA except that entities (only) are recognised an=
d expanded (have I got that right?)), and TEMP (which did nothing other t=
han mark the contained text as temporary). I presume the duplication of =
the ']' in the marked-section-close is partly to keep the brackets balanc=
ed, and partly because it's a string that's unlikely to appear in normal =
text. It was possible to have whitespace either side of the status-keywo=
rd terms, so that '<![ CDATA [...]]>' would be a legal SGML declaration=
=2E
I think that all of these except CDATA were dropped in XML, along with th=
e different lexical classes, so that (*checks*...) the start of a CDATA s=
ection is just '<![CDATA[' as an otherwise unintelligible magic string. =
Why that particular magic string and not a saner one? Purely, I think, t=
o retain the status of XML documents as being also parseable as SGML. Th=
at is, SGML would lex this string differently, but react in the same way.=
The other gasp-worthy thing about SGML was that all of these lexical item=
s, such as '<', '<!', and so on and very much on, were configurable, so y=
ou could prefix your document with declarations (in the 'other' syntax) w=
hich changed these, and have different character sequences open and close=
start-tags, processing instructions, and so on. The angle brackets and =
ampersands we're familiar with are just the SGML defaults.
Enough (slightly deranged) nostalgia!
Best wishes,
Norman
-- =
Norman Gray : https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic