'HOWTO: Deal properly with internationalisation in Qt 2 and KDE 2 code'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    HOWTO: Deal properly with internationalisation in Qt 2 and KDE 2 code
From:       David Faure <david () mandrakesoft ! com>
Date:       2000-11-27 22:35:37
[Download RAW message or body]

"Sergey A. Sukiyazov" <ssukiyazov@freemail.ru> wrote this nice summary
of common problems with the way US or European developers write
Qt/KDE code, leading to many problems for users of other encodings.
Those who already know most of this stuff should still have a look at
the bit about QTextStream (section 3), I was quite surprised to learn that 
its default conversion depends on the underlying device...

Glossary
=========
For those a bit confused by the terms used below, I'll try to define the 
most common ones, as an introduction. If you know those terms, skip
to the second part of the document.

* encoding: the way character codes (usually 0-255) are understood. 
Due to ancient limitations, the same codes between 0 and 255 mean
different things from a country to another. Deciding which character
maps to those codes is controlled by the "encoding".
The difference with "charset" is that the "charset" is rather used for fonts, 
but the concept is more or less the same.

* ascii: encoding covering only the range 0-127, with 'american' 
non-accentuated letters, digits, punctation signs, etc.

* latin1: the encoding used by the Western European countries,
which adds to ascii some accentuated letters etc. It happens to be the
default encoding in many places of Qt and KDE, which is why
non-latin1 users have more trouble than latin1 users :(
Latin1 is also known as ISO-8859-1.

* "local8bit": the correct encoding for a given country (Qt chooses
what local8bit does depending on $LANG and $LC_CTYPE).
This encoding defines other characters (esp. in the range 128-255),
that are needed by a given country/language.
Sergey also calls them the "National characters".

* utf8/utf16: Encodings for "Unicode", i.e. finally, an encoding that doesn't
depend on the country. In Unicode, each code means a single character,
which is why unicode is the real long term solution to this mess... if only 
there were Unicode fonts... :)
QString stores everything in Unicode, so for any text displayed in the GUI
there's no need to take any special action. What is tricky, is to know how to 
convert this to the right encoding when interfacing with the file system, when
storing data somewhere, etc.

Anyway, up to you Sergey (I took the liberty to rephrase some sentences) :

HOW TO DEAL PROPERLY WITH INTERNATIONALISATION
==============================================

    1. Right-fashion-localized system must permit to use of national characters
       everythere, incl. filenames etc.	

    2. Methods QString::latin1(), QString::ascii() and implicit cast 
       QString => (const char *) for national UNICODE strings return empty
       string because translation works until first UNICODE character with
       non-zero higher byte is occured on input string.

       However, translation from QString to char* is done by .latin1() method
       for most programs. Consider to use (const char *)str.local8Bit() 
       (str is of class QString) or the method 
       (const char *)QFile::encodeName(QString &fileName) in order to use 
       national characters everywhere (for example, in system calls etc.).  
       
       In general, the most frequently encountered problem is translation from
       QString to char* through .latin1() or .ascii() method (used latin1() by
       default). These methods translate UNICODE string until encounter the 
       first character with non-zero higher byte that results in termination
       of this translation when first national character encountered ==>
       russian UNICODE characters will not be translated and these methods will
       return empty strings. 

       Furthermore, it is nessesary to compile programs with Qt-2.X.X with 
       option -DQT_NO_ASCII_CAST in order to prevent automatic use of method
       .latin1() to translate QString into (const char*). It may be possible
       that program compilation aborts with the error, for example:
       
----C++ Sample Code : Cut from here ------------------------	       
QString fileName = QString::fromLocal8Bit("Файл");
FILE *f;
....

f = ::fopen( fileName, "r+w" );
.....
----C++ Sample Code : Cut from here ------------------------	

       If option -DQT_NO_ASCII_CAST is not given then automatic translation
       will be done through QString::latin1()  that results in the empty string.
       To prevent error during compilation with the option -DQT_NO_ASCII_CAST
       given it is nessesary to use the following:
 
----C++ Sample Code : Cut from here ------------------------	
QString fileName = QString::fromLocal8Bit("Файл");
FILE *f;
....

f = ::fopen( (const char *)QFile::encodeName(fileName), "r+w" );
.....
----C++ Sample Code : Cut from here ------------------------	

       [Note: KDE's compilation system defines -DQT_NO_ASCII_CAST by default]
       The above example applies to all translations from UNICODE to 8-bit
       strings. In general, you should use the methods QString::local8Bit() and 
       QString::fromLocal8Bit(...) instead of QString::latin1() and 
       QString::QString(const char *) and QString::fromLatin1(...).

    3. The second error resulting into incorrect conversion of national 
       characters from UNICODE into one-byte string, arises at usage of the 
       class QTextStream bound with a file or with 8-bit string (QByteArray 
       or QCString).

       A text stream (class QTextStream), bound with the file (class QIODevice), 
       always will use conversion local8Bit (Encoding == QTextStream:: Locale),
       but a text stream (class QTextStream), bound with one-byte string 
       (class QByteArray or class QCString), will use conversion Latin1 
       (Encoding == QTextStream:: Latin1) !
       If Encoding == QTextStream::Latin1, no conversion is made, so use this
       to write char *s or QCStrings to a stream without conversion.

       It can be tested, if you look through Qt source code 
       (File qt-2.2.2/src/tools/qtextsream.cpp Line 556).
       If we insert string in UNICODE (class QString), containing national 
       characters, into stream bound with one-byte string, the method 
       QTextStream::operator<<(const QString& s) is executed, and if
       Encoding == QTextStream::Latin1, it converts 'Russian characters' to '?' !

       For characters in the codings ISO-8859-1 or us-ascii of conversion have 
       no effect, the codes of these characters do not vary. In ISO-8859-1 or 
       us-ascii UNICODE strings high byte is 0, the low byte is equal to the 
       character code, that is the same in one-byte encoding. The strings in 
       the coding UTF8, containing only characters ISO-8859-1 or us-ascii, 
       remain without change.

       For national characters in UNICODE high byte is distinct from 0. The 
       low byte does not coincide with the one-byte character code. Therefore 
       such strings expose to changes at conversion QString ==> (const char *).

       If programs are tested using only strings in the encodings ISO-8859-1 
       or us-ascii, the errors of conversion QString <==> (const char *) will 
       not be visible, because no conversion happens.

       When using the class QTextStream, always explicitely set the 
       mode of conversion of characters, calling the method 
       QTextStream::setEncoding(...). It will save in the future from incorrect
       conversion of national characters when using QTextSream. 

       Thanks for spreading these reasons among all programmers
       writing programs for KDE or simply Qt, so that in the future we can 
       minimize the manifestation of these troubles. 

Best regards
Sukiyazov Sergey <corwin@dstu.rnd.runnet.ru>
                 <ssukiyazov@freemail.ru>


-------------------------------------------------------

-- 
David FAURE, david@mandrakesoft.com, faure@kde.org
http://www.mandrakesoft.com/~david/, http://www.konqueror.org/
KDE, Making The Future of Computing Available Today
See http://www.kde.org/kde1-and-kde2.html for how to set up KDE 2

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic