[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kopete-devel
Subject:    Re: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing
From:       Jason Keirstead <jason () keirstead ! org>
Date:       2004-01-23 20:40:52
Message-ID: 200401231640.52963.jason () keirstead ! org
[Download RAW message or body]

On January 23, 2004 4:20 pm, Martijn Klingens wrote:
> I want first and foremost to have accurate and autodetected conversion.

This is impossible :P

> The user's setting should be TRIED first, but not FORCED. If it is broken
> utf8 we know it will break the parser, it makes no sense to obey the user
> at all.

So you mean, if the user chose UTF8 then we check isUTF8,  and if it is
not, then replace with ? characters wherever needed?

I would go for that I guess.

> - make sure that whenever Utf8 is being used isUtf8() is called first and
> if it fails forget about using Utf8

No. See, this is the problem. You are assuming that you should try UTF then if 
UTF fails then you'll be able to guess something.

This is backwards. UTF is the only codec that gives no failure, also it's the 
only one we have to scan over *twice (isUTF8() and then conversion ) so its 
the most expensive. And on top of all this, hardly anyone uses it. So it's 
most error prone, most expensive, and no one uses it. It *definitly* should 
be the last check.

*UNLESS* the user chose it. If the user chose UTF, then attempt isUTF8, if 
that fails, then *maybe* try latin1, if that fails, just clean up wherever 
possible. There's no point trying local8bit, it's bound to fail.

> Not really. Generally contact lists tend to consist of people from mostly
> the same country. 

Eh huh? Not from my experience... I have people from here, from Europe, from 
Asia. Anyways, contact lists don't really have much to do with it, especially 
on IRC. Anyone could message you from anywhere out of the blue.

My new proposed ordering in pseudo code:

if( userCodec == QTextCodec::codecForName("utf") )
{
	if( isUTF8( string ) )
		return tryCodec->decode( string )
	else
	{
		try QTextCodec::codecForName("latin1")->decode( string )
		if( success )
		{
			return
		}
		else
		{
			return cleanString( string );
		}
	}
}
else
{
	if( userCodec && tryCodec->decode( string )
		return;
	else
	{
		try QTextCodec::codecForName("latin1")->decode( string )
		if( success )
		{
			return
		}
		else
		{
			return cleanString( string );
		}
	}
}

.. where cleanString strips all non-UTF-8 decodable characters from the string 
somehow.

-- 
There's no place like 127.0.0.1

http://www.keirstead.org
_______________________________________________
Kopete-devel mailing list
Kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic