[prev in list] [next in list] [prev in thread] [next in thread] 

List:       9fans
Subject:    Re: [9fans] utf-8 handling oddities
From:       LdBeth <andpuke () foxmail ! com>
Date:       2023-10-14 4:56:29
Message-ID: tencent_52F608630CE8D45815FDE9BB4D750B3B820A () qq ! com
[Download RAW message or body]

>>>>> In <1597A7B3-09D5-443F-B372-8B28F5F2B059@aaoth.xyz> 
>>>>>   la-ninpre <aaoth@aaoth.xyz> wrote:

la-ninpre> if i understand it correctly, unicode extended past the BMP
la-ninpre> in 1996 with the release of unicode 2.0. plan 9 had two
la-ninpre> editions released after that, but, of course assuming that
la-ninpre> archives on p9f are indeed correct, the implementation
la-ninpre> didn't reflect the change in the code until 2013 (and
la-ninpre> that's why that old code propagated to both plan9port and
la-ninpre> 9front). so, maybe someone knows why is that the case? i'd
la-ninpre> appreciate any input on this or some pointers to
la-ninpre> information resources that you may know of.

Fun fact, "the underlying Xerces parser used by most systems never
implemented XML 1.0 fifth edition" (which was released in 2008).

It is not uncommon for implementors to decide not cover new features
that is lesser of their interests.

Also, UTF-8 is **not required** to handle surrogate by Unicode standard
and Rob Pike has said in a relevant golang thread:

> It's correct to reject them

https://golang-dev.narkive.com/4Zves5rC/surrogate-halves-and-utf-8

which also explains the rationale of the Plan9 code.

la-ninpre> best regards,
la-ninpre> la ninpre.


---
ldbeth


------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T8384b8174eb88096-M50e3a04b5272c6334c10d2af
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic