[prev in list] [next in list] [prev in thread] [next in thread]
List: kopete-devel
Subject: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing errors,
From: Robin Rosenberg <robin.rosenberg () dewire ! com>
Date: 2004-01-26 15:25:35
Message-ID: 20040126152535.1957.qmail () ktown ! kde ! org
[Download RAW message or body]
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
http://bugs.kde.org/show_bug.cgi?id=72917
------- Additional Comments From robin.rosenberg@dewire.com 2004-01-26 \
16:25 -------
Subject: Re: [Kopete-devel] UTF8 and other cause XML parsing errors, only \
in IRC conversations
fredagen den 23 januari 2004 21.40 skrev Jason Keirstead:
> On January 23, 2004 4:20 pm, Martijn Klingens wrote:
[....]
> > - make sure that whenever Utf8 is being used isUtf8() is called first \
> > and if it fails forget about using Utf8
> No. See, this is the problem. You are assuming that you should try UTF \
> then if UTF fails then you'll be able to guess something.
> This is backwards. UTF is the only codec that gives no failure, also it's \
> the only one we have to scan over *twice (isUTF8() and then conversion ) \
> so its the most expensive. And on top of all this, hardly anyone uses \
> it. So it's most error prone, most expensive, and no one uses it. It \
> *definitly* should be the last check.
I don't know KDE/QT that well. How come utf cannot fail. Utf-8 is designed \
so that it is unlikely that a non-utf-8 string can recognized as utf-8. If \
the UTF-decoder cannot fail, then what does it do when it encounters an \
illegal sequence?
On the other hand. How could an attempt to decode a string byes as \
IsoLatin1 fail? A human user can say that something isn't latin1, but the \
computer cannot unless we add a user specified blacklist, IMHO overkill.
> > Not really. Generally contact lists tend to consist of people from \
> > mostly the same country.
>
> Eh huh? Not from my experience... I have people from here, from Europe, \
> from Asia. Anyways, contact lists don't really have much to do with it, \
> especially on IRC. Anyone could message you from anywhere out of the \
> blue.
I suppose experience can vary here. To me it's either isolatin1 or ascii \
that comes overr the wire.. With isolatin it's usually the same county or \
countries that use the same character set. Nevertheless, the future will \
become more and more utf8:ized.
> My new proposed ordering in pseudo code:
sounds reasonable.
Perhaps Latin9 (ISO-8859-15) should be attempted instead of Latin1. The \
difference is that a few characters that were "never" used were replaces by \
some that actually are used.
-- robin
_______________________________________________
Kopete-devel mailing list
Kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic