'Re: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kopete-devel
Subject:    Re: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing
From:       Jason Keirstead <jason () keirstead ! org>
Date:       2004-01-23 20:08:18
Message-ID: 200401231608.18551.jason () keirstead ! org
[Download RAW message or body]

On January 23, 2004 2:55 pm, Martijn Klingens wrote:
> What I was saying is to try (in this order)
>
> In the plugin (IRC or ICQ):
> - Decode as utf8. If isUtf8() is available, use it and continue if it
> fails. Otherwise we have to assume it's utf8 and continue at the XSLT part
> below.

This will only work in like 0.01% of the cases in IRC, that's why I was saying 
its a waste of time. Not sure about ICQ but I suspect it is the same thing 
there.. not many people use UTF-8. That's why I want to leave that check 
until last.

> - If both utf8 and latin1 fail, try local8Bit IF AND ONLY IF the local
>   encoding is neither utf8 nor latin1.
>
> - When all these failed, use your code that replaces invalid chars with
>   question marks. Since we're doing it *here* that means the whole dreaded
>   'should never happen' XML error indeed no longer happens at all.

But in all this, where is the user's chosen codec? The user's selected codec
should *always* be tried first.

> Your code has the tremendous advantage that it allows a custom codec
> selection, moving even more code duplication from the plugins. I like that.
> Some things I miss in your code though:
>
> - if preferredCode is UTF-8 we're at square one, because canDecode() will
>   always return true. 

I don't really see this as much of a problem. If the default codec for all 
contacts is Latin1, then the user has to manually change to UTF-8. If they 
manually do this I don't have a problem with it mis-detecting and failing 
with an error / warning; they are the ones who chose that.

>   Therefore UTF-8 should be special cased and use 
>   KStringHandler when available.

Use it for what? As I said, isUTF8() is pretty much useless, since it is 
hardly *ever* UTF-8.

That's why my code tries everything else *first*, then falls back on UTF-8 
with the ? replacement if needed.

> - You use less fallbacks than I had in mind. See the above heuristics.

Other than the local8bit() fallback (which is also useless... what does my 
local codec have to do with the sender's? There's really no correlation, it'd 
just be random luck to work), the only difference is that I move the UTF 
check to the end.

-- 
There's no place like 127.0.0.1

http://www.keirstead.org
_______________________________________________
Kopete-devel mailing list
Kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel
[prev in list] [next in list] [prev in thread] [next in thread]