[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: [Kde-pim] Fwd: Re: KDE 4.4.98 (4.4 RC3)
From:       argonel <argonel () gmail ! com>
Date:       2010-02-08 23:43:12
Message-ID: 28d9390d1002081543ud9fe14bg657125b4e308dc78 () mail ! gmail ! com
[Download RAW message or body]

On Mon, Feb 8, 2010 at 5:10 PM, Thiago Macieira <thiago@kde.org> wrote:

> Em Segunda-feira 8. Fevereiro 2010, =E0s 21.15.51, Albert Astals Cid
> escreveu:
> > A Dilluns, 8 de febrer de 2010, Thiago Macieira va escriure:
> > > Em Domingo 7. Fevereiro 2010, =E0s 16.33.34, argonel escreveu:
> > > > On Sun, Feb 7, 2010 at 3:58 AM, Thiago Macieira <thiago@kde.org>
> wrote:
> > > > > The protection has to happen somewhere. Technically, it's
> > > > > Konversation's fault
> > > > > for passing unfiltered network data into an API.
> > > > >
> > > > > But it could also be a QString issue, for allowing those invalid
> > > > > UTF-8 strings
> > > > > to be converted to UTF-16 in the first place.
> > > > >
> > > > > Note that changing the D-Bus behaviour may likely introduce bugs =
in
> > > > > Glib-based
> > > > > applications, where conversions from UTF-8 do implement this chec=
k.
> > > > > (Which, in
> > > > > my opinion, is incomplete)
> > > >
> > > > If you're referring to dbus's lack of checks for 0x1FFFE and so on,=
 I
> > > > found that I was unable to create a QChar > 0xFFFF, so perhaps not
> > > > checking those is reasonable.
> > >
> > > Of course you can't create a QChar > 0xFFFF.
> > >
> > > But QString can handle UTF-16 surrogate pairs and does it just fine.
> The
> > > sequence 0xD83F 0xDFFF is the U+1FFFF non-character.
> > >
> > > The question is: should those be allowed to exist in a QString? (I
> think
> > >
> > >  the answer is yes)
> > >
> > > Should QString::toUtf8 and fromUtf8 accept those?
> >
> > From what i understand, they are not valid UTF-8 (just valid UTF-16) so=
 i
> > think the obvious (from the i have no idea of what i'm talking about
> > position) is saying "No".
>
> While I agree with you, I have to ask: why?
>
> Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?
>
>
RFC 3629 section 3 says:

 "The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16
encoding form (as surrogate pairs) and do not directly represent
characters. When encoding in UTF-8 from UTF-16 data, it is necessary
to first decode the UTF-16 data to obtain character numbers which
are then encoded in UTF-8 as described above."

If QString uses UTF-16 internally, QString::toUtf8 should be converting the
surrogate pairs to valid Utf8, and QString::fromUtf8 should not accept them
and provide some kind of error feedback.


> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>  Senior Product Manager - Nokia, Qt Development Frameworks
>      PGP/GPG: 0x6EF45358; fingerprint:
>      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
>

[Attachment #3 (text/html)]

<br><br><div class="gmail_quote">On Mon, Feb 8, 2010 at 5:10 PM, Thiago Macieira \
<span dir="ltr">&lt;<a href="mailto:thiago@kde.org">thiago@kde.org</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, \
204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Em Segunda-feira 8. Fevereiro \
2010, ās 21.15.51, Albert Astals Cid escreveu:<br> <div><div></div><div \
class="h5">&gt; A Dilluns, 8 de febrer de 2010, Thiago Macieira va escriure:<br> &gt; \
&gt; Em Domingo 7. Fevereiro 2010, ās 16.33.34, argonel escreveu:<br> &gt; &gt; &gt; \
On Sun, Feb 7, 2010 at 3:58 AM, Thiago Macieira &lt;<a \
href="mailto:thiago@kde.org">thiago@kde.org</a>&gt; wrote:<br> &gt; &gt; &gt; &gt; \
The protection has to happen somewhere. Technically, it&#39;s<br> &gt; &gt; &gt; &gt; \
Konversation&#39;s fault<br> &gt; &gt; &gt; &gt; for passing unfiltered network data \
into an API.<br> &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; But it could also be a QString issue, for allowing those \
invalid<br> &gt; &gt; &gt; &gt; UTF-8 strings<br>
&gt; &gt; &gt; &gt; to be converted to UTF-16 in the first place.<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; Note that changing the D-Bus behaviour may likely introduce bugs \
in<br> &gt; &gt; &gt; &gt; Glib-based<br>
&gt; &gt; &gt; &gt; applications, where conversions from UTF-8 do implement this \
check.<br> &gt; &gt; &gt; &gt; (Which, in<br>
&gt; &gt; &gt; &gt; my opinion, is incomplete)<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; If you&#39;re referring to dbus&#39;s lack of checks for 0x1FFFE and \
so on, I<br> &gt; &gt; &gt; found that I was unable to create a QChar &gt; 0xFFFF, so \
perhaps not<br> &gt; &gt; &gt; checking those is reasonable.<br>
&gt; &gt;<br>
&gt; &gt; Of course you can&#39;t create a QChar &gt; 0xFFFF.<br>
&gt; &gt;<br>
&gt; &gt; But QString can handle UTF-16 surrogate pairs and does it just fine. \
The<br> &gt; &gt; sequence 0xD83F 0xDFFF is the U+1FFFF non-character.<br>
&gt; &gt;<br>
&gt; &gt; The question is: should those be allowed to exist in a QString? (I \
think<br> &gt; &gt;<br>
&gt; &gt;  the answer is yes)<br>
&gt; &gt;<br>
&gt; &gt; Should QString::toUtf8 and fromUtf8 accept those?<br>
&gt;<br>
&gt; From what i understand, they are not valid UTF-8 (just valid UTF-16) so i<br>
&gt; think the obvious (from the i have no idea of what i&#39;m talking about<br>
&gt; position) is saying &quot;No&quot;.<br>
<br>
</div></div>While I agree with you, I have to ask: why?<br>
<br>
Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?<br>
<div><div></div><div class="h5"><br></div></div></blockquote><div><br>RFC 3629 \
section 3 says:<br><br> <meta name="qrichtext" content="1"><meta \
http-equiv="Content-Type" content="text/html; charset=utf-8"><style type="text/css"> \
p, li { white-space: pre-wrap; } </style>
<p style="margin: 0px; text-indent: 0px;"><span style="color: rgb(0, 0, 0);">   \
&quot;The definition of UTF-8 prohibits encoding character numbers between<br>   \
U+D800 and U+DFFF, which are reserved for use with the UTF-16<br>  encoding form (as \
surrogate pairs) and do not directly represent<br>   characters.  When encoding in \
UTF-8 from UTF-16 data, it is necessary<br>   to first decode the UTF-16 data to \
obtain character numbers which<br>   are then encoded in UTF-8 as described \
above.&quot;</span></p> <br>If QString uses UTF-16 internally, QString::toUtf8 should \
be converting the surrogate pairs to valid Utf8, and QString::fromUtf8 should not \
accept them and provide some kind of error feedback.<br> </div><blockquote \
class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt \
0pt 0.8ex; padding-left: 1ex;"> <div><div class="h5">
--<br>
Thiago Macieira - thiago (AT) <a href="http://macieira.info" \
target="_blank">macieira.info</a> - thiago (AT) <a href="http://kde.org" \
target="_blank">kde.org</a><br>  Senior Product Manager - Nokia, Qt Development \
Frameworks<br>  PGP/GPG: 0x6EF45358; fingerprint:<br>
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358<br>
</div></div></blockquote></div><br>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic