'[Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing errors,'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kopete-devel
Subject:    [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing errors,
From:       Robin Rosenberg <robin.rosenberg () dewire ! com>
Date:       2004-01-26 15:25:35
Message-ID: 20040126152535.1957.qmail () ktown ! kde ! org
[Download RAW message or body]

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

http://bugs.kde.org/show_bug.cgi?id=72917      

------- Additional Comments From robin.rosenberg@dewire.com  2004-01-26 \
                16:25 -------
Subject: Re: [Kopete-devel]  UTF8 and other cause XML parsing errors, only \
in IRC conversations

fredagen den 23 januari 2004 21.40 skrev Jason Keirstead:
> On January 23, 2004 4:20 pm, Martijn Klingens wrote:
[....]
> > - make sure that whenever Utf8 is being used isUtf8() is called first \
> > and if it fails forget about using Utf8
> No. See, this is the problem. You are assuming that you should try UTF \
> then if  UTF fails then you'll be able to guess something.
> This is backwards. UTF is the only codec that gives no failure, also it's \
> the  only one we have to scan over *twice (isUTF8() and then conversion ) \
> so its  the most expensive. And on top of all this, hardly anyone uses \
> it. So it's  most error prone, most expensive, and no one uses it. It \
> *definitly* should  be the last check.

I don't know KDE/QT that well. How come utf cannot fail. Utf-8 is designed \
so that it is unlikely that a non-utf-8 string can recognized as utf-8. If \
the UTF-decoder cannot fail, then what does it do when it encounters an \
illegal sequence?

On the other hand. How could an attempt to decode a string byes as \
IsoLatin1 fail? A human user can say that something isn't latin1, but the \
computer cannot unless we add a user specified blacklist, IMHO overkill.

> > Not really. Generally contact lists tend to consist of people from \
> > mostly the same country. 
> 
> Eh huh? Not from my experience... I have people from here, from Europe, \
> from  Asia. Anyways, contact lists don't really have much to do with it, \
> especially  on IRC. Anyone could message you from anywhere out of the \
> blue.

I suppose experience can vary here. To me it's either isolatin1 or ascii \
that comes overr the wire.. With isolatin it's usually the same county or \
countries that use the same character set. Nevertheless, the future will \
become more and more utf8:ized.

> My new proposed ordering in pseudo code:

sounds reasonable. 

Perhaps Latin9 (ISO-8859-15) should be attempted instead of Latin1. The \
difference is that a few characters that were "never" used were replaces by \
some that actually are used.

-- robin
_______________________________________________
Kopete-devel mailing list
Kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel

[prev in list] [next in list] [prev in thread] [next in thread]