[prev in list] [next in list] [prev in thread] [next in thread]
List: kopete-devel
Subject: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing errors,
From: Martijn Klingens <klingens () kde ! org>
Date: 2004-01-23 18:56:01
Message-ID: 20040123185601.3474.qmail () ktown ! kde ! org
[Download RAW message or body]
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
http://bugs.kde.org/show_bug.cgi?id=72917
------- Additional Comments From klingens@kde.org 2004-01-23 19:55 -------
Subject: Re: [Kopete-devel] UTF8 and other cause XML parsing errors, only in IRC conversations
On Friday 23 January 2004 18:48, Jason Keirstead wrote:
> The IRC engine just uses the contact codec, which if not defined is UTF.
> Basically what you are saying is use latin1 if it is undefined, and if that
> craps out then try latin1, and if that fails just send the incorrect utf
> data to the XML parser, where it will fail and print the warning.
Nope.
What I was saying is to try (in this order)
In the plugin (IRC or ICQ):
- Decode as utf8. If isUtf8() is available, use it and continue if it fails.
Otherwise we have to assume it's utf8 and continue at the XSLT part below.
- Decode as latin1.
- If both utf8 and latin1 fail, try local8Bit IF AND ONLY IF the local
encoding is neither utf8 nor latin1.
- When all these failed, use your code that replaces invalid chars with
question marks. Since we're doing it *here* that means the whole dreaded
'should never happen' XML error indeed no longer happens at all.
In the XML/XSLT code:
- Use the code that we have now
- If the decoding fails, use a more verbose error. With the above changes
this should however become an almost unused code path.
> Only if you can pass a preferred codec to this static. I think this would
> be the best, a combination of your suggestion and my previous function:
>
> (snip)
Your code has the tremendous advantage that it allows a custom codec
selection, moving even more code duplication from the plugins. I like that.
Some things I miss in your code though:
- if preferredCode is UTF-8 we're at square one, because canDecode() will
always return true. Therefore UTF-8 should be special cased and use
KStringHandler when available.
- You use less fallbacks than I had in mind. See the above heuristics.
(The encoding discussion is finally getting interesting again BTW after months
of frustration :)
_______________________________________________
Kopete-devel mailing list
Kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic