[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xml-dev
Subject:    Re: [xml-dev] Many different syntaxes in XML - is that good language design?
From:       Norman Gray <norman.gray () glasgow ! ac ! uk>
Date:       2022-03-07 16:57:16
Message-ID: 42D561E6-1002-4CF7-BB7B-917A11C5EB74 () glasgow ! ac ! uk
[Download RAW message or body]


Pete, hello.

On 7 Mar 2022, at 15:52, Pete Cordell wrote:

> Viewed like that it seems a fairly minimal and efficient syntax.

Indeed.  SGML is a thing of some beauty, viewed through the right (rather=
 special) spectacles.

>  (It does make me wonder why the CDATA section 'directive' wasn't just =
<!CDATA[...]>.  Even more curious, given all the SGML things that got dro=
pped, is how it got included in XML.  It creates just as many problems as=
 it solves.)

It's certainly pretty orthogonal, in all sorts of directions.

Regarding <![CDATA[...]]> vs <!CDATA[ ... ]>, the sequence of tokens here=
 *in SGML* is

  <!  : markup declaration open
  [    : declaration subset open
  CDATA  : status-keyword
  [    : dso again
  ...data
  ]]  : marked section close
  >  : markup declaration close (which happens to be the same character, =
by default, as element start-tag-close, and a few others)

I'm not 100% clear why the 'declaration subset open' is so-called.  This =
token is also used to introduce the DTD declaration, full or partial, at =
the very top of a document (which is written in the 'other' syntax, in th=
e terms of this thread), and it seems to have been reused here _partly_ a=
s a sort-of gesture towards the declaration language of DTDs -- ie, the f=
irst '[' is effectively signalling an escape inside the escape, in a diff=
erent direction.  The SGML status keywords, alongside CDATA, were/are INC=
LUDE, IGNORE (which includes and ignores the text inside the construct), =
RCDATA (which is like CDATA except that entities (only) are recognised an=
d expanded (have I got that right?)), and TEMP (which did nothing other t=
han mark the contained text as temporary).  I presume the duplication of =
the ']' in the marked-section-close is partly to keep the brackets balanc=
ed, and partly because it's a string that's unlikely to appear in normal =
text.  It was possible to have whitespace either side of the status-keywo=
rd terms, so that '<![ CDATA   [...]]>' would be a legal SGML declaration=
=2E

I think that all of these except CDATA were dropped in XML, along with th=
e different lexical classes, so that (*checks*...) the start of a CDATA s=
ection is just '<![CDATA[' as an otherwise unintelligible magic string.  =
Why that particular magic string and not a saner one?  Purely, I think, t=
o retain the status of XML documents as being also parseable as SGML.  Th=
at is, SGML would lex this string differently, but react in the same way.=


The other gasp-worthy thing about SGML was that all of these lexical item=
s, such as '<', '<!', and so on and very much on, were configurable, so y=
ou could prefix your document with declarations (in the 'other' syntax) w=
hich changed these, and have different character sequences open and close=
 start-tags, processing instructions, and so on.  The angle brackets and =
ampersands we're familiar with are just the SGML defaults.

Enough (slightly deranged) nostalgia!

Best wishes,

Norman


-- =

Norman Gray  :  https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic