[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kopete-devel
Subject:    Re: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing
From:       Thiago Macieira <thiago.macieira () kdemail ! net>
Date:       2004-01-23 18:48:45
Message-ID: 200401231648.46174.thiago.macieira () kdemail ! net
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


Jason Keirstead wrote:
>	if( !preferredCodec->canDecode( string ) )
>	{
>		QTextCodec *utfCodec = QTextCodec::codecForName( "utf8" );
>  		QString resultString;
>
>    		for( uint i = 0; i < utf.length(); i++ )
>		{
>		    	QChar thisChar = utf[i];
>		        if( utfCodec->canDecode( thisChar )
>		            resultString += utfCodec->toUnicode( thisChar );
>   			else
>		            resultString += QChar('?');
>		}
>
>		return resultString;
>	}

Hmm... I don't like this block of code. You're decoding character by 
character in UTF-8, which the decoder can't cope with, because you'd be 
passing each character in a multi-byte sequence separately. Actually, I 
can't find a toUnicode(QChar) function anywhere...

KStringHandler::isUtf8 is preferrable, except it's marked @since 3.2. 
Maybe Kopete should copy that function into its own library then.

IMO, the order of the codecs to be tried should be:
- if the preferred codec is given, use it
- if it isn't given, try to the user's locale codec. If that fails, try 
UTF-8
- if any other fail, decode as Latin1
- finally, clean up U+0000 to U+001F (excepting, maybe, newlines, etc.)
(note: no locale nor UTF-8 if preferred is given)

I believe that would generate a clean input for the XML parser, as long 
as special care is taken around the UTF-8 decoder to make sure the 
broken decoder isn't triggered. (Quite oddly, I can't find it in my Qt 
source code now).

Finally, instead of QChar('?'), I'd recommend QChar::replacement.

-- 
  Thiago Macieira  -  Registered Linux user #65028
   thiagom (AT) mail (dot) com
    ICQ UIN: 1967141   PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

[Attachment #5 (application/pgp-signature)]

_______________________________________________
Kopete-devel mailing list
Kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic