'[Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing errors,'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kopete-devel
Subject:    [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing errors,
From:       Martijn Klingens <klingens () kde ! org>
Date:       2004-01-23 18:56:01
Message-ID: 20040123185601.3474.qmail () ktown ! kde ! org
[Download RAW message or body]

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

http://bugs.kde.org/show_bug.cgi?id=72917      

------- Additional Comments From klingens@kde.org  2004-01-23 19:55 -------
Subject: Re: [Kopete-devel]  UTF8 and other cause XML parsing errors, only in IRC conversations

On Friday 23 January 2004 18:48, Jason Keirstead wrote:
> The IRC engine just uses the contact codec, which if not defined is UTF.
> Basically what you are saying is use latin1 if it is undefined, and if that
> craps out then try latin1, and if that fails just send the incorrect utf
> data to the XML parser, where it will fail and print the warning.

Nope.

What I was saying is to try (in this order)

In the plugin (IRC or ICQ):
- Decode as utf8. If isUtf8() is available, use it and continue if it fails.
  Otherwise we have to assume it's utf8 and continue at the XSLT part below.

- Decode as latin1.

- If both utf8 and latin1 fail, try local8Bit IF AND ONLY IF the local
  encoding is neither utf8 nor latin1.

- When all these failed, use your code that replaces invalid chars with
  question marks. Since we're doing it *here* that means the whole dreaded
  'should never happen' XML error indeed no longer happens at all.

In the XML/XSLT code:
- Use the code that we have now

- If the decoding fails, use a more verbose error. With the above changes
  this should however become an almost unused code path.

> Only if you can pass a preferred codec to this static. I think this would
> be the best, a combination of your suggestion and my previous function:
>
> (snip)

Your code has the tremendous advantage that it allows a custom codec 
selection, moving even more code duplication from the plugins. I like that. 
Some things I miss in your code though:

- if preferredCode is UTF-8 we're at square one, because canDecode() will
  always return true. Therefore UTF-8 should be special cased and use
  KStringHandler when available.

- You use less fallbacks than I had in mind. See the above heuristics.

(The encoding discussion is finally getting interesting again BTW after months 
of frustration :)
_______________________________________________
Kopete-devel mailing list
Kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel
[prev in list] [next in list] [prev in thread] [next in thread]