[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: [Kde-pim] Fwd: Re: KDE 4.4.98 (4.4 RC3)
From:       Johannes Sixt <j.sixt () viscovery ! net>
Date:       2010-02-09 8:13:52
Message-ID: 4B711940.9030108 () viscovery ! net
[Download RAW message or body]

Thiago Macieira schrieb:
> Em Segunda-feira 8. Fevereiro 2010, ās 21.15.51, Albert Astals Cid escreveu:
>> A Dilluns, 8 de febrer de 2010, Thiago Macieira va escriure:
>>> But QString can handle UTF-16 surrogate pairs and does it just fine. The
>>> sequence 0xD83F 0xDFFF is the U+1FFFF non-character.
>>>
>>> The question is: should those be allowed to exist in a QString? (I think
>>>
>>>  the answer is yes)
>>>
>>> Should QString::toUtf8 and fromUtf8 accept those?
>> From what i understand, they are not valid UTF-8 (just valid UTF-16) so i
>> think the obvious (from the i have no idea of what i'm talking about
>> position) is saying "No".
> 
> While I agree with you, I have to ask: why?
> 
> Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?

It is not valid UTF-8 to write the surrogate pair 0xD83F 0xDFFF as two
separately UTF-8-encoded byte sequences. The correct way is to encode
U+1FFFF as a single UTF-8-encoded byte sequence 0xF0 0x9F 0xBF 0xBF.

http://en.wikipedia.org/wiki/CESU-8

-- Hannes

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic