[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-core-devel
Subject: Re: [Kde-pim] Fwd: Re: KDE 4.4.98 (4.4 RC3)
From: Thiago Macieira <thiago () kde ! org>
Date: 2010-02-09 15:02:58
Message-ID: 201002091702.59443.thiago () kde ! org
[Download RAW message or body]
Em Terça-feira 9. Fevereiro 2010, às 10.13.52, Johannes Sixt escreveu:
> Thiago Macieira schrieb:
> > While I agree with you, I have to ask: why?
> >
> > Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?
>
> It is not valid UTF-8 to write the surrogate pair 0xD83F 0xDFFF as two
> separately UTF-8-encoded byte sequences. The correct way is to encode
> U+1FFFF as a single UTF-8-encoded byte sequence 0xF0 0x9F 0xBF 0xBF.
>
> http://en.wikipedia.org/wiki/CESU-8
QString correctly encodes UTF-16 surrogate pairs as their UTF-8 sequences,
like you said above.
But that was not the question. The question was whether the surrogate pair
0xD83F 0xDFFF should be considered improper for UTF-8 encoding and dropped?
And the opposite: should the UTF-8 sequence 0xF0 0x9F 0xBF 0xBF be considered
incorrect and dropped?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Senior Product Manager - Nokia, Qt Development Frameworks
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
["signature.asc" (application/pgp-signature)]
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic