[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: Problem with encodings in several places in KDE
From:       Thiago Macieira <thiago.macieira () kdemail ! net>
Date:       2003-11-17 18:39:03
[Download RAW message or body]


Waldo Bastian wrote:
>Keep in mind that N23835 fixes (part of) the problem now, your proposal will
>not be available before Qt 4.

I am aware of that. But my proposal could be included in Qt 3.3 (if that is 
released) without too great damage -- marshalling formats notwithstanding. 
See below.

>> Next, (and here's what I am proposing to TT) is that both QString and
>> QCString hold a QTextCodec* pointer to the codec that can be used to
>> convert the string back to its original form. QFile::encodeName and decode
>> would be a special QTextCodec in this regard and they have to work for
>> every encoding, not just UTF-8. One solution would be to break the
>> filename into its components and encode each one separately; if any fail,
>> the same "broken UTF-8" decoding of the current solution can be applied.
>
>??? You want to register the encoding for each of the segments and keep them
>around, even under transformation?

No, that's not what I want. I want a pathname to be broken into components 
(separated by QDir::separator) and each component be separately inspected for 
invalid sequences.

The idea is that one could have a file with undecodable filename but whose 
leading directory names are legal. The current solution only solves the UTF-8 
case, not any other cases that might arise (as you yourself pointed out in 
July).

The idea is to decode properly the legal components, but use the "broken 
UTF-8" method for those that can't be decoded. Since nothing else should be 
generating those surrogate pairs, it's safe to assume that when they are 
present in a path component, it indicates undecoded 8-bit sequences.

For other strings that are not filenames, the old rules would apply. I.e., 
QString::fromUtf8(s).utf8() == s is not  guaranteed.

(A leading marker per component might be adviseable, performance-wise)

>Keeping codecs around in the QString would indeed be nice, yes. Changes in
>marshall format would break KDE4 - KDE3 wire compatibility though. Then
>again, if we just drop that requirement, the migration to D-BUS will become
> a lot easier.

Not necessarily since the wire format is versioned. Qt can read all the 
previous version's marshalling format and write in them -- at least QStrings 
can. It's just a matter of handshake.

KDE classes might not be checking the stream version and thus not be prepared 
to handle multiple versions of itself.

By the way, recommendation for future wire formats: include a size argument 
leading to each block in such a way that new fields are appended to the end. 
Older implementations will ignore them if present; newer implementations will 
assume default values if missing.

-- 
  Thiago Macieira  -  Registered Linux user #65028
   thiagom@mail.com           
    ICQ UIN: 1967141   PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

[Attachment #3 (application/pgp-signature)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic