From kopete-devel Fri Jan 23 20:49:03 2004 From: Thiago Macieira Date: Fri, 23 Jan 2004 20:49:03 +0000 To: kopete-devel Subject: Re: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing Message-Id: <200401231849.03909.thiago.macieira () kdemail ! net> X-MARC-Message: https://marc.info/?l=kopete-devel&m=107489095120312 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--===============0112494106==" --===============0112494106== Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_/iYEADIUqwgCQAE"; charset="iso-8859-1" Content-Transfer-Encoding: 7bit --Boundary-02=_/iYEADIUqwgCQAE Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Jason Keirstead wrote: >The point of this code block is not to determine if a string is UTF.. > we pretty much *know* it isn't. It's purpose is to take a non-UTF > string for which the codec is totally unknown and make it displayable > as UTF, by replacing undecodable characters with ? All the QTextCodec decoders do that, except for the UTF-8 one. That's=20 the only reason why we have to special-case UTF-8 (because Qt=20 special-cases it too). >I had assumed that QChar took care of the doubel byte stuff for me. QChar does. But your code reads one 'char' at a time. And it uses a=20 variable 'utf' you defined nowhere, so I can't really tell what you=20 wanted. >> - if it isn't given, try to the user's locale codec. If that fails, >> try > >Why bother with the locale codec? What does my locale have to do with > the sender's? Because there's a high probability that the person you're talking to is=20 talking the same language as you speak. Therefore, it's quite possible=20 the encoding is the same. Think of text-mode IRC clients, for instance. >UTF-8 should be last, because latin1 will indicate failure if it's not > latin1. UTF is the only codec in QT that *always* decodes. So we > shoudl try that last, then clean it up. No, most codecs can decode safely most strings. All the ISO-8859 codecs=20 do so, at least. That's not because the codec is broken: it's just=20 because any 8-bit string is valid ISO-8859-X (except for NULs). So, fromLatin1 will never fail. Again, note "will never fail" is not the=20 same as "does not report failure". =2D-=20 Thiago Macieira - Registered Linux user #65028 thiagom (AT) mail (dot) com ICQ UIN: 1967141 PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 --Boundary-02=_/iYEADIUqwgCQAE Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQBAEYi/M/XwBW70U1gRAn46AJ9uCPFUTO7sYsUDy7hVMlCRM5UyqACgu5d9 BnHldsM8+x6YeDtlrvxTjas= =kjcz -----END PGP SIGNATURE----- --Boundary-02=_/iYEADIUqwgCQAE-- --===============0112494106== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Kopete-devel mailing list Kopete-devel@kde.org https://mail.kde.org/mailman/listinfo/kopete-devel --===============0112494106==--