[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: [Kde-pim] Fwd: Re: KDE 4.4.98 (4.4 RC3)
From:       Thiago Macieira <thiago () kde ! org>
Date:       2010-02-09 15:02:58
Message-ID: 201002091702.59443.thiago () kde ! org
[Download RAW message or body]


Em Terça-feira 9. Fevereiro 2010, às 10.13.52, Johannes Sixt escreveu:
> Thiago Macieira schrieb:
> > While I agree with you, I have to ask: why?
> > 
> > Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?
> 
> It is not valid UTF-8 to write the surrogate pair 0xD83F 0xDFFF as two
> separately UTF-8-encoded byte sequences. The correct way is to encode
> U+1FFFF as a single UTF-8-encoded byte sequence 0xF0 0x9F 0xBF 0xBF.
> 
> http://en.wikipedia.org/wiki/CESU-8

QString correctly encodes UTF-16 surrogate pairs as their UTF-8 sequences, 
like you said above.

But that was not the question. The question was whether the surrogate pair 
0xD83F 0xDFFF should be considered improper for UTF-8 encoding and dropped? 
And the opposite: should the UTF-8 sequence 0xF0 0x9F 0xBF 0xBF be considered 
incorrect and dropped?

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Senior Product Manager - Nokia, Qt Development Frameworks
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

["signature.asc" (application/pgp-signature)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic