From kopete-devel  Fri Jan 23 20:49:03 2004
From: Thiago Macieira <thiago.macieira () kdemail ! net>
Date: Fri, 23 Jan 2004 20:49:03 +0000
To: kopete-devel
Subject: Re: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing
Message-Id: <200401231849.03909.thiago.macieira () kdemail ! net>
X-MARC-Message: https://marc.info/?l=kopete-devel&m=107489095120312
MIME-Version: 1
Content-Type: multipart/mixed; boundary="--===============0112494106=="


--===============0112494106==
Content-Type: multipart/signed; protocol="application/pgp-signature";
	micalg=pgp-sha1; boundary="Boundary-02=_/iYEADIUqwgCQAE";
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit


--Boundary-02=_/iYEADIUqwgCQAE
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Jason Keirstead wrote:
>The point of this code block is not to determine if a string is UTF..
> we pretty much *know* it isn't. It's purpose is to take a non-UTF
> string for which the codec is totally unknown and make it displayable
> as UTF, by replacing undecodable characters with ?

All the QTextCodec decoders do that, except for the UTF-8 one. That's=20
the only reason why we have to special-case UTF-8 (because Qt=20
special-cases it too).

>I had assumed that QChar took care of the doubel byte stuff for me.

QChar does. But your code reads one 'char' at a time. And it uses a=20
variable 'utf' you defined nowhere, so I can't really tell what you=20
wanted.

>> - if it isn't given, try to the user's locale codec. If that fails,
>> try
>
>Why bother with the locale codec? What does my locale have to do with
> the sender's?

Because there's a high probability that the person you're talking to is=20
talking the same language as you speak. Therefore, it's quite possible=20
the encoding is the same.

Think of text-mode IRC clients, for instance.

>UTF-8 should be last, because latin1 will indicate failure if it's not
> latin1. UTF is the only codec in QT that *always* decodes. So we
> shoudl try that last, then clean it up.

No, most codecs can decode safely most strings. All the ISO-8859 codecs=20
do so, at least. That's not because the codec is broken: it's just=20
because any 8-bit string is valid ISO-8859-X (except for NULs).

So, fromLatin1 will never fail. Again, note "will never fail" is not the=20
same as "does not report failure".
=2D-=20
  Thiago Macieira  -  Registered Linux user #65028
   thiagom (AT) mail (dot) com
    ICQ UIN: 1967141   PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

--Boundary-02=_/iYEADIUqwgCQAE
Content-Type: application/pgp-signature
Content-Description: signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQBAEYi/M/XwBW70U1gRAn46AJ9uCPFUTO7sYsUDy7hVMlCRM5UyqACgu5d9
BnHldsM8+x6YeDtlrvxTjas=
=kjcz
-----END PGP SIGNATURE-----

--Boundary-02=_/iYEADIUqwgCQAE--

--===============0112494106==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Kopete-devel mailing list
Kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel

--===============0112494106==--