From kopete-devel Mon Jan 26 15:18:02 2004 From: Robin Rosenberg Date: Mon, 26 Jan 2004 15:18:02 +0000 To: kopete-devel Subject: Re: [Kopete-devel] [Bug 72917] UTF8 and other cause XML parsing Message-Id: <200401261618.03003.robin.rosenberg () dewire ! com> X-MARC-Message: https://marc.info/?l=kopete-devel&m=107513070529714 fredagen den 23 januari 2004 21.40 skrev Jason Keirstead: > On January 23, 2004 4:20 pm, Martijn Klingens wrote: [....] > > - make sure that whenever Utf8 is being used isUtf8() is called first and > > if it fails forget about using Utf8 > No. See, this is the problem. You are assuming that you should try UTF then if > UTF fails then you'll be able to guess something. > This is backwards. UTF is the only codec that gives no failure, also it's the > only one we have to scan over *twice (isUTF8() and then conversion ) so its > the most expensive. And on top of all this, hardly anyone uses it. So it's > most error prone, most expensive, and no one uses it. It *definitly* should > be the last check. I don't know KDE/QT that well. How come utf cannot fail. Utf-8 is designed so that it is unlikely that a non-utf-8 string can recognized as utf-8. If the UTF-decoder cannot fail, then what does it do when it encounters an illegal sequence? On the other hand. How could an attempt to decode a string byes as IsoLatin1 fail? A human user can say that something isn't latin1, but the computer cannot unless we add a user specified blacklist, IMHO overkill. > > Not really. Generally contact lists tend to consist of people from mostly > > the same country. > > Eh huh? Not from my experience... I have people from here, from Europe, from > Asia. Anyways, contact lists don't really have much to do with it, especially > on IRC. Anyone could message you from anywhere out of the blue. I suppose experience can vary here. To me it's either isolatin1 or ascii that comes overr the wire.. With isolatin it's usually the same county or countries that use the same character set. Nevertheless, the future will become more and more utf8:ized. > My new proposed ordering in pseudo code: sounds reasonable. Perhaps Latin9 (ISO-8859-15) should be attempted instead of Latin1. The difference is that a few characters that were "never" used were replaces by some that actually are used. -- robin _______________________________________________ Kopete-devel mailing list Kopete-devel@kde.org https://mail.kde.org/mailman/listinfo/kopete-devel