[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: RFC - Separate filesystem encoding from document encoding
From:       Michael Ritzert <kde () ritzert ! de>
Date:       2002-02-12 12:12:48
[Download RAW message or body]

Hi Ryan,

that's actually trivial: '/' is in ASCII and as such never appears in UTF-8 
except in its original meaning. This is the case for all ASCII characters.
The reason this works is that all UTF-8 sequences that do not represent ASCII 
characters have their high bit set. This follows directly from the definition 
in RFC 2279:

   UCS-4 range (hex.)           UTF-8 octet sequence (binary)
   0000 0000-0000 007F   0xxxxxxx
   0000 0080-0000 07FF   110xxxxx 10xxxxxx
   0000 0800-0000 FFFF   1110xxxx 10xxxxxx 10xxxxxx

   0001 0000-001F FFFF   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
   0020 0000-03FF FFFF   111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
   0400 0000-7FFF FFFF   1111110x 10xxxxxx ... 10xxxxxx

Michael

Am Sonntag, 10. Februar 2002 20:19 schrieb Ryan Cumming:
> On February 10, 2002 h:41, Neil Stevens wrote:
> > KDE and Qt's multiple text codec support affords great flexibility
> > throughout KDE.  Users can read and write documents in any encoding KDE
> > supports, viewing and using any character in unicode.  But, for filenames
> > they are tied to the encoding of their own language.
> >
> > I think it'd be a good idea for the user's filesystem encoding to be
> > optionally different from his language encoding, specifically to allow
> > the use of unicode filenames.
> >
> > Comments?
>
> I'd like to see how you would guarantee the Unix path delimiter ("/")
> wouldn't show up in the UTF-8 stream.
>
> -Ryan
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic