------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. http://bugs.kde.org/show_bug.cgi?id=72917 ------- Additional Comments From klingens@kde.org 2004-01-26 20:56 ------- Subject: Re: [Kopete-devel] UTF8 and other cause XML parsing errors, only in IRC conversations On Monday 26 January 2004 15:08, Jason Keirstead wrote: > > I want first and foremost to have accurate and autodetected conversion. > > This is impossible :P True. But when using the right order (User Pref, Local Encoding, UTF-8, Latin1) at least you can make sure the chances of it failing are minimized. > > The user's setting should be TRIED first, but not FORCED. If it is broken > > utf8 we know it will break the parser, it makes no sense to obey the user > > at all. > > So you mean, if the user chose UTF8 then we check isUTF8, and if it is > not, then replace with ? characters wherever needed? Close. First, I would use QChar::replacement like Thiago mentioned instead of '?'. Second, instead of doing the replacement if isUtf8 fails I would use Thiago's order, which would mean that after a Utf-8 failure latin1 is used. Arguably we could better try Utf-8 BEFORE local encoding, because utf8 failure can be detected and local not in all cases (like when local is in Latin1). > No. See, this is the problem. You are assuming that you should try UTF then > if UTF fails then you'll be able to guess something. Exactly. > This is backwards. UTF is the only codec that gives no failure, Yes. HOWEVER, Latin1 is even worse, because it CANNOT FAIL. Whatever you feed as Latin1, it is BY DEFINITION LEGAL. Thus, you can't do Utf-8 after Latin1, it _HAS_ to be done before Latin1. > also it's the only one we have to scan over *twice (isUTF8() and then > conversion ) so its the most expensive. Like Thiago said, isUtf8() doesn't copy data and should be fairly inexpensive. Also, I would like to see figures of the additional load, I think it is in fact pretty much neglectable for most uses. After all QString is one of the most heavily optimized Qt classes. Do you have any KCacheGrind logs proving me wrong? > And on top of all this, hardly anyone uses it. More and more people start using it, especially with ICQ, which also needs this code. And, again, Utf-8 HAS to be checked before Latin1, because after trying Latin1 you cannot POSSIBLY get a failure. So whether it "should" be the last check for performance reasons or not, it CANNOT be the last check, no matter how much you'd want it. > There's no point trying local8bit, it's bound to fail. This too is wrong for most non-western locales. In fact, with ICQ in Russia it would be VERY IMPORTANT to have. > Eh huh? Not from my experience... I have people from here, from Europe, > from Asia. Anyways, contact lists don't really have much to do with it, > especially on IRC. Anyone could message you from anywhere out of the blue. Try thinking outside the IRC box :) (With IRC I tend to agree with the people on channels being diverse, although many people I know are only on Dutch language IRC channels, and almost all people I know have exclusively Dutch people on their contact list. We open source people are quite a different breed from the average user base.) _______________________________________________ Kopete-devel mailing list Kopete-devel@kde.org https://mail.kde.org/mailman/listinfo/kopete-devel