'Re: [kopete-devel] Patch to fix encoding problems in both incoming'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kopete-devel
Subject:    Re: [kopete-devel] Patch to fix encoding problems in both incoming
From:       Matt Rogers <mattr () kde ! org>
Date:       2006-01-04 5:21:21
Message-ID: 200601032321.24154.mattr () kde ! org
[Download RAW message or body]

[Attachment #2 (multipart/signed)]

On Tuesday 03 January 2006 22:21, Oleg Girko wrote:
> Hello!
>
> Sorry for delay of my answer, it was New Year holidays, lots of drinking
> and than lots of sleeping... :-)
>
> On Sunday 01 January 2006 18:35, Matt Rogers wrote:
> > And now, on to the patch! :)
> >
> > First, could you seperate the user details merging from the encoding
> > stuff and after you've made modifications to the patch, send two patches
> > instead of one? (you can send them in the same email message, of course)
>
> Of course, I can (but it will require editing patches manually, because
> "oscarcontact.cpp" file is involved in both patches).
>
> I will send fixed patch splitted into two parts shortly after this message.
>

I committed the UserDetails merging stuff already, since it was quite fine the 
way it is and i was being too picky. I've already sent a modified version of 
the encodings patch because I got a bit impatient and wanted to make sure we 
didn't lose such good work.

> > For the UserDetails merging, I think it would be better to use an enum as
> > a bitmask rather than using a bunch of bools. Something like this is what
> > I have in mind:
> >
> > public:
> > enum { UserClass = 0x01,
> >        MemberSince = 0x02,
> >        OnlineSince = 0x04,
> >        IdleTime = 0x05
> >       }
> >
> > private:
> > int m_updatedElements;
> >
> > and in UserDetails::merge( UserDetails& ud ):
> > {
> > 	if ( m_updatedElements & 0x01 )
> > 		// do stuff
> >
> > ...
> > }
> >
> > Hopefully, you get the idea about what I'm going for here. It saves
> > allocating a bunch of bools at least, and probably makes for cleaner
> > code.
>
> I don't think so. In contrary, it makes code less clean: I think that
>
>     if ( m_updatedElements & 0x01 )
>
> and even
>
>     if ( m_updatedElements & UserClass )
>
> is more difficult to read than
>
>     if ( m_userClassSpecified )
>
> Moreover, using bitmask is more error-prone. Look at your code example, it
>
> already contains an error, which is not so easy to find:
> >        IdleTime = 0x05
>
> In this case I agree with Thiago Macieira that the best solution is to pack
> boolean attributes using bitfields. This improves memory consumption
> without sacrificing code readability.
>

agreed, although i'm not sure i took that route when I committed that part of 
the patch. Have you applied for a KDE SVN account yet? If not, apply for one, 
and then commit when we have the bitfields. :) 

> > I think overloading an operator so that something like "m_details +=
> > details" in the appropriate place to combine the old details with the new
> > details would be cool.
>
> Looks like a beautiful solution on the first glance, but think about the
> following consideration.
>
> People who read code have intuitive perception that "a += b" is
> conceptually equivalent to "a = a + b". First, there is no operator+ for
> UserDetails. Second, even if we implement operator+ for UserDetails, it
> won't behave symmetric way people expect from operator+, so that "a + b"
> and "b + a" will yield different results!
>
> Hence, operator+= for UserDetails behaves intuitively only on the surface,
> but in depth it does not meet several intuitive expectations people have.
>
> May be, I'm overcautious in this case, but I think that operators should be
> overloaded very carefully, after examining all possible pitfalls, to be
> sure it really improves readability and understandability of code, and does
> not give Java proponents another cause to say that operator overloading is
> evil. :-)
>

yeah, well, i realized that my idea about having an operator += wasn't such a 
good idea after reading some bits about operator overloading from the C++ 
FAQ. 

> > As for the rest of it, most of it looks ok. The one glaring problem is
> > the introduction of OscarAccount into the liboscar code. I'd like to keep
> > the liboscar code as free of Kopete and KDE specific parts as possible
> > (yes, i know i use kdebug) so i'm wondering if it's possible to do all of
> > this w/o relying on OscarAccount in the liboscar code and then submit a
> > new patch.
>
> I have one idea, but it seems little bit tricky.
>
> First, let's have an abstract class inside Client class defined like this:
>
> class CodecFactory
> {
> public:
>     virtual void codecForContact( const QString& contactName ) const = 0;
> };
>
> Second, we modify Client class so that CodecFactory to be used by Client
> instance can be specified either by Client::setCodecFactory( const
> CodecFactory* ) method or by an argument to Client's constructor.
>
> Third, we either use multiple inheritance to inherit OscarAccount from
> Client::CodecFactory and redefine codecForContact() inside OscarAccount, or
> define class derived from Client::CodecFactory which redefines
> codecForContact() using backpointer to OscarAccount inside.
>
> What do you think about this idea?
>

this would be a good idea, indeed and would keep the seperation. Much better 
than my using Oscar::Settings to store everything.

> > Other various comments:
> >
> > OscarAccount::defaultCodec should return the native encoding if an
> > account specific encoding has not been set by the user. For example, if
> > I'm using a german system, the default codec should be that for german
> > rather than for latin1.
>
> Unfortunately, most modern Linux distributions use UTF-8 as the default
> encoding independent of whether default language is German or Russian, so
> QTextCodec::codecForLocale() is useless. There is no point to specify UTF-8
> as the encoding for non-Unicode messages. MS Windows has "Language for
> non-Unicode programs" setting, and I presume that this setting is used by
> official ICQ client for non-Unicode messages. Unfortunately, there is no
> such setting for Linux and other Unix-like systems.
>
> Moreover, some languages have several different popular non-Unicode
> encodings. For example, most popular 8-bit Russian encoding in Unix world
> is KOI-8, but using it as a default for ICQ is meaningless, because most
> ICQ users use Windows, and default Windows encoding for non-Unicode
> applications is CP1251 for Russian language.
>
> Hence, I don't see simple straightforward way to find best (most popular)
> ICQ encoding for specified language.
>

True, and hopefully with the per-account and the per-contact settings, we can 
at least work around it enough so that things work once all the settings are 
correct.

> > The change in icqaccount.cpp results in a very very long line. Could you
> > split that up so the lines are shorter? (It will also make it easier to
> > read)
>
> Actually, I was avoiding splitting arguments of function calls because I
> don't see such splitting in the existing code and as a result I don't
> understand rules of splitting. For example, I don't know line length limit
> (I see many lines which exceed 80-character line length), and I don't know
> in which position the next argument of function call should start on the
> next line.
>

haha, yeah, maintainer's perogative, i guess. i can break my own coding style 
whenever i want. ;)

> > I don't understand why you split the encodings out from the rest of the
> > message properties in Oscar::Message other than to distinguish between
> > encodings and the rest of the possible message properties. Please
> > explain.
>
> I don't see the reason to pack several unrelated enums and booleans inside
> one bitmask. It sacrifices readability for saving several bytes of memory.
> Oscar::Message is short-lived object, which exists only for passing
> messages between highlevel and lowlevel parts. Considering single-threaded
> nature of Kopete, I presume that there are only few instances of
> Oscar::Message exist at any given moment of time, so small extra number of
> bytes is not an issue in this case. Anyway, in order to save bytes,
> bitfields can be used.
>
> Also, using non-anonymous enum for encoding increases reliability through
> type-safety. It's impossible to use by mistake an arbitrary integer as an
> encoding: this error will be detected at compile time.
>
> Concerning code readability, it's unclear from reading code which bits of
> bitmask stored in m_properties member are conceptially enums and which are
> independent boolean properties. For example, is it valid to have
> ( Normal | AutoResponse ) or ( EMail | WWP ) mask?
>

good points.

> I suggest to split m_properties member into several booleans and enums
> (packed as bitfields for efficiency), but this change is unrelated to
> encoding issues I fix with my patch, so I don't touch them.
>
> As a sidenote, I presume that frequent use of bitmasks instead of bitfields
> to pack boolean values arises from misunderstanding the reasons bitmasks
> are widely used inside Unix system calls API. Unix system calls API was
> developed when C compiler was very primitive and bitfields was not
> supported. In modern times C (and C++) compilers which do not support
> bitfields became completely extinct, but inertia of thought remained, and
> use (and abuse) of bitmasks became popular development pattern.
>
> > in chatservicetask.cpp, language and encoding are only there for debug
> > output and can remain QStrings.
>
> They are processed the same way as message, so I've just made it uniform
> way.
>
> Anyway, if these strings are assumed to have latin1 encoding, they should
> be converted to QString using QString::fromLatin1() to specify explicitly
> that latin1 encoding is used. All conversions to QString which implicitly
> assume latin1 encoding must be eliminated.
>
> > The conversion from the using of Buffer::getWordBlock for the charset
> > 0x02 handling in messagereceivertask.cpp is wrong as is the appropriate
> > code to do the conversion in oscarmessage.cpp. I suggest a setText method
> > in Oscar::Message that takes a word pointer and a length so that the
> > conversion can be done with QString::fromUcs2.
>
> First, the code used before my patch
>
>   msg.setText( QString::fromUcs2( message.getWordBlock( messageLength ) )
> );
>
> is incorrect because it leads to memory leak: getWordBlock() method
> allocates array of short integers which is never deallocated after that.
>

oops. :(

> Second, equivalent code with memory leak fixed, like
>
>   WORD *ucs2 = message.getWordBlock( messageLength );
>   msg.setText( QString::fromUcs2( ucs2 ) );
>   delete[] ucs2;
>
> is incorrect as well, because it has memory leak in case comething in line
>
>   msg.setText( QString::fromUcs2( ucs2 ) );
>
> throws an exception.
>

normally, i would agree with the above (about setText throwing an exception), 
but KDE doesn't use exceptions, and if we have problems allocating memory, 
well, then we're pretty much hopeless as it is. :)

> Third, if Oscar::Message stores message in raw format alongside with
> information about encoding, it should do it always. Using Oscar::Message as
> dual-purpose class to store either raw or decoded messages is evil.
> Lowlevel part should not care about conversions between encodings at all.
>
> And if using Oscar::Message as a raw message storage (like it should be),
> I see no point to convert raw data from QByteArray to array of short
> integers using Buffer::getWordBlock(), than convert it to QString using
> QString::fromUcs2, than convert it again to QByteArray inside
> Oscar::Message using Oscar::Message::setText(). It's equivalent to just
> copy part of buffer into Oscar::Message like I do, but with more overhead.
>
> Thanks for rewiewing my patch,
> -- Oleg Girko, http://www.infoserver.ru/~ol/

yeah, after writing my email, I realized all my points are pretty much moot, 
other than the seperation of the lowlevel and the highlevel code, hence the 
reason I committed some things like the UserDetails merging, and sent you a 
revised patch for the encoding stuff. :)

Thanks again for taking on this beast of a problem.
-- 
Matt

[Attachment #5 (application/pgp-signature)]

_______________________________________________
kopete-devel mailing list
kopete-devel@kde.org
https://mail.kde.org/mailman/listinfo/kopete-devel

[prev in list] [next in list] [prev in thread] [next in thread]