[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Bug#4050: No NLS support in konqueror
From:       Dawit Alemayehu <adawit () kde ! org>
Date:       2000-05-29 4:16:54
[Download RAW message or body]

On Mon, 29 May 2000, Denis Perchine wrote:
> > > > ?? You mean that latin1 is a subset of utf-8? How can that be? Does
> > > > latin1 leave values undefined which are used by utf-8? Is this a
> > 
> > Actually, only the lower 128  bytes (ASCII range) appear as themselves 
> > (one byte, with the same value) in UTF-8.  Unicode values from 
> > U+0080->U+07FF are encoded in UTF-8 as two bytes, and the remaining 
> > Unicode values (disregarding surrogates) appear as three bytes.  In 
> > two, three, or four-bytes sequences, ALL bytes of the sequence will 
> > have the high-order bit set.  Thus, latin-1 characters in the range 
> > 0x80 through 0xFF will appear as two-byte sequences, with the high 
> > order bit on in each byte.
> 
> Yeps. That's right. I'd better say ASCII-7 than latin-1 (my small mistake).
> And that's why all is working fine when you have only english letters inside.

Well, I personally should have stated at the beginning that I am no expert with
character encodings and formats.  Thus, I am disqualified to make any detailed
techincal discussions on the merits of different encoding formats.  In fact I
just started reading a book on Unicode :((

Having said that what I was trying to state previously was why I think the
specifications push conversion to uft8.  And I think I have made that clear. 

However, since the real world implementation is way different from the 
specifications, I am open to all suggestions on how this should be handled. 
Of course, Waldo might have more knowledge and direction than me on this, but I
am open to suggestions.

My question now to Dennis and the other good folks participating on this is
what can we do to alleviate this issue for the non-latin1 folks.  We internally
store all the components of any give URL in QString which according to the
documentation is unciode.  My issue however is that if we convert to locale
8-bit before escaping, how would this work when the URL is sent to another
machine with a different locale ?? I am asking becuase I am no longer sure
how this works ?  Is the encoding problems you guys have only restricted to
local file system access or everything ?  This might help us to focus on where
the fix needs to be applied.

Regards,
Dawit A.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic