[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-bugs-dist
Subject:    [Bug 72917] UTF8 and other cause XML parsing errors,
From:       Jason Keirstead <jason () keirstead ! org>
Date:       2004-01-23 17:48:50
Message-ID: 20040123174850.14826.qmail () ktown ! kde ! org
[Download RAW message or body]

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
      
http://bugs.kde.org/show_bug.cgi?id=72917      




------- Additional Comments From jason@keirstead.org  2004-01-23 18:48 -------
Subject: Re: [Kopete-devel]  UTF8 and other cause XML parsing errors, only in IRC conversations

On January 23, 2004 12:39 pm, Martijn Klingens wrote:
> That, too. But I was thinking of a much earlier stage: when you are parsing
> incoming IRC data and when Oscar is parsing incoming ICQ data.

It would be useless here. The incoming IRC data is almost never going to be 
UTF 8, no one uses it ( I wish they did, but they don't ).

> If isUtf8() fails it can try ::fromLatin1 because that one AFAIK can be
> reliably autodetected (unlike utf8() it doesn't accept invalid chars
> AFAIK), followed by your fallback.

The IRC engine just uses the contact codec, which if not defined is UTF. 
Basically what you are saying is use latin1 if it is undefined, and if that 
craps out then try latin1, and if that fails just send the incorrect utf data 
to the XML parser, where it will fail and print the warning.

> A simple static in libkopete (QString
> KopeteMessage::detectEncoding( char * ) ?) could handle it, and avoid the
> problem altogether.

Only if you can pass a preferred codec to this static. I think this would be 
the best, a combination of your suggestion and my previous function:

QString KopeteMessage::decodeString( QCString string, QTextCodec 
*preferredCodec = 0L )
{
	if( !preferredCodec )
		preferredCodec = QTextCodec::codecForName("latin1");

	if( !preferredCodec->canDecode( string ) )
	{
		QTextCodec *utfCodec = QTextCodec::codecForName( "utf8" );
  		QString resultString;

    		for( uint i = 0; i < utf.length(); i++ )
		{
		    	QChar thisChar = utf[i];
		        if( utfCodec->canDecode( thisChar )
		            resultString += utfCodec->toUnicode( thisChar );
   			else
		            resultString += QChar('?');
		}
		
		return resultString;
	}
	else
	{
		return preferredCodec->toUnicode( string );
	}
}
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic